apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
61.72k stars 13.49k forks source link

Error when fetching Hudi tables Schema #21945

Closed MateusCastello closed 2 months ago

MateusCastello commented 1 year ago

I'm using Trino with Superset for data exploration, but I keep getting errors with metadata on Hudi tables on the table schema preview, the data retrieval/querries works just fine.

For some reason, the sqlalchemy lib tries to get partitions field ($partitions), wich is a Hive table special field, so I get a InsufficientLakeFormation error from AWS (as this field does not exists on S3)

I am using Trino 400, with the Hudi connector, Superset version 2.0.0 with python trino package on version 0.318.0

How to reproduce the bug

  1. Go to SqlLab
  2. Click on database and choose Trino
  3. Select a Schema that uses Hudi as table format
  4. Choose a table to see the schema and recieve the error

Expected results

See the table schema on the Superset interface

Actual results

The Schema from the table does not load on the UI

Screenshots

image

Environment

Log from superset pod as the bug happens

2022-10-26 19:56:37,755:ERROR:root:(trino.exceptions.TrinoExternalError) TrinoExternalError(type=EXTERNAL, name=HIVE_METASTORE_ERROR, message="Insufficient Lake Formation permission(s) on tb_app$partitions (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: ceaaaf08-a369-4453-9007-dfe4d4ab1329; Proxy: null)", query_id=20221026_195637_00121_fbmvi)
[SQL: SELECT
    "column_name",
    "data_type",
    "column_default",
    UPPER("is_nullable") AS "is_nullable"
FROM "information_schema"."columns"
WHERE "table_schema" = ?
  AND "table_name" = ?
ORDER BY "ordinal_position" ASC]
[parameters: ('a_bd', 'tb_app$partitions')]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/trino/sqlalchemy/dialect.py", line 360, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/trino/dbapi.py", line 424, in execute
    result = self._query.execute()
  File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 758, in execute
    self._result.rows += self.fetch()
  File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 773, in fetch
    status = self._request.process(response)
  File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 574, in process
    raise self._process_error(response["error"], response.get("id"))
  File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 543, in _process_error
    raise exceptions.TrinoExternalError(error, query_id)
trino.exceptions.TrinoExternalError: TrinoExternalError(type=EXTERNAL, name=HIVE_METASTORE_ERROR, message="Insufficient Lake Formation permission(s) on tb_app$partitions (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: ceaaaf08-a369-4453-9007-dfe4d4ab1329; Proxy: null)", query_id=20221026_195637_00121_fbmvi)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/flask_appbuilder/api/__init__.py", line 86, in wraps
    return f(self, *args, **kwargs)
  File "/app/superset/views/base_api.py", line 113, in wraps
    raise ex
  File "/app/superset/views/base_api.py", line 110, in wraps
    duration, response = time_function(f, self, *args, **kwargs)
  File "/app/superset/utils/core.py", line 1507, in time_function
    response = func(*args, **kwargs)
  File "/app/superset/utils/log.py", line 245, in wrapper
    value = f(*args, **kwargs)
  File "/app/superset/databases/api.py", line 598, in table_extra_metadata
    payload = database.db_engine_spec.extra_table_metadata(
  File "/app/superset/db_engine_specs/presto.py", line 906, in extra_table_metadata
    indexes = database.get_indexes(table_name, schema_name)
  File "/app/superset/models/core.py", line 699, in get_indexes
    indexes = self.inspector.get_indexes(table_name, schema)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/reflection.py", line 513, in get_indexes
    return self.dialect.get_indexes(
  File "/usr/local/lib/python3.8/site-packages/trino/sqlalchemy/dialect.py", line 249, in get_indexes
    partitioned_columns = self._get_columns(connection, f"{table_name}$partitions", schema, **kw)
  File "/usr/local/lib/python3.8/site-packages/trino/sqlalchemy/dialect.py", line 158, in _get_columns
    res = connection.execute(sql.text(query), schema=schema, table=table_name)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2235, in execute
    return connection.execute(statement, *multiparams, **params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1011, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1124, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1316, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1276, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/trino/sqlalchemy/dialect.py", line 360, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.8/site-packages/trino/dbapi.py", line 424, in execute
    result = self._query.execute()
  File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 758, in execute
    self._result.rows += self.fetch()
  File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 773, in fetch
    status = self._request.process(response)
  File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 574, in process
    raise self._process_error(response["error"], response.get("id"))
  File "/usr/local/lib/python3.8/site-packages/trino/client.py", line 543, in _process_error
    raise exceptions.TrinoExternalError(error, query_id)
sqlalchemy.exc.OperationalError: (trino.exceptions.TrinoExternalError) TrinoExternalError(type=EXTERNAL, name=HIVE_METASTORE_ERROR, message="Insufficient Lake Formation permission(s) on tb_app$partitions (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: ceaaaf08-a369-4453-9007-dfe4d4ab1329; Proxy: null)", query_id=20221026_195637_00121_fbmvi)
[SQL: SELECT
    "column_name",
    "data_type",
    "column_default",
    UPPER("is_nullable") AS "is_nullable"
FROM "information_schema"."columns"
WHERE "table_schema" = ?
  AND "table_name" = ?
ORDER BY "ordinal_position" ASC]
[parameters: ('a_bd', 'tb_app$partitions')]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
127.0.0.1 - - [26/Oct/2022:19:56:37 +0000] "GET /api/v1/database/1/table_extra/tb_app/a_bd/ HTTP/1.1" 500 26 "http://localhost:45091/superset/sqllab/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
127.0.0.1 - - [26/Oct/2022:19:56:37 +0000] "GET /static/assets/207a4252758bcd4d3cbd.chunk.js HTTP/1.1" 200 793 "http://localhost:45091/superset/sqllab/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36"
10.235.7.215 - - [26/Oct/2022:19:56:38 +0000] "GET /health HTTP/1.1" 200 2 "-" "kube-probe/1.22+"
arnabneogi86 commented 1 year ago

any update on this?

hashhar commented 8 months ago

Fixed via https://github.com/trinodb/trino-python-client/pull/426

hashhar commented 8 months ago

I'll update here once a release is available with the change.

hashhar commented 6 months ago

A new release of trino-python-client 0.328.0 is now available with the fix. So probably Superset can be updated to use it which should resolve the issue.

rusackas commented 6 months ago

Thanks for the info! Opened a PR (see link right above this). We might want to hold off from merging it until 4.0 is out, lest we introduce an unexpected bug, but we should be able to do that before long.

Vitor-Avila commented 2 months ago

Hey @rusackas got to this link while searching for something else, but wondering if we're good to close this one?

rusackas commented 2 months ago

Oooh, nice catch @Vitor-Avila! Closing!