apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
60.43k stars 13.04k forks source link

Superset fails to use Trino's nested fields in charts and filters #27034

Open ruifpedro opened 4 months ago

ruifpedro commented 4 months ago

Bug description

Heya Superset team!

I ran into a bug that prevents me from using nested columns in charts and especially dashboard filters. This happens with datasets that use a Trino database with the Enable row expansion in schemas enabled (introduced in Superset 3.1.0), which, as the name states, expands nested columns and effectively adds them to the dataset columns list.

This happens due to a possible problem with Superset's query builder, where a given nested column, let's say metadata.field_a is parsed into the query as "metadata.field_a", while Trino only accepts either "metadata"."field_a" or metadata.field_a. This causes the column selection in chart exploration to create invalid queries (although it can be mitigated by always using Custom SQL). Additionally, this completely breaks the usage of dashboard filters, where the custom SQL workaround is not possible.

I provide evidence below and am available to provide more details if needed.

Looking forward towards your response.

How to reproduce the bug

In chart exploration:

  1. Create a new chart
  2. Select a Trino based dataset with nested columns
  3. Add a nested column to dimensions
  4. Press create chart
  5. The chart now should display Cannot load filter Error: line 1:8: Column '<nested_column_name>' cannot be resolved

In dashboard filters:

  1. Create a dashboard or open an existing one
  2. Add a filter
  3. Select a Trino based dataset with nested columns
  4. Select a nested column as the column
  5. Save
  6. The filter now should display Cannot load filter Error: line 1:8: Column '<nested_column_name>' cannot be resolved

Screenshots/recordings

Stacktrace

superset Query SELECT "metadata.uuid" AS "metadata.uuid",
superset        COUNT(*) AS count
superset FROM raw_runtime_broker.intenthit
superset GROUP BY "metadata.uuid"
superset ORDER BY count DESC
superset LIMIT 1000 on schema raw_runtime_broker failed
superset Traceback (most recent call last):
superset   File "/app/superset/connectors/sqla/models.py", line 1795, in query
superset     df = self.database.get_df(sql, self.schema, mutator=assign_column_label)
superset   File "/app/superset/models/core.py", line 612, in get_df
superset     self.db_engine_spec.execute(cursor, sqls[-1])
superset   File "/app/superset/db_engine_specs/base.py", line 1596, in execute
superset     raise cls.get_dbapi_mapped_exception(ex) from ex
superset   File "/app/superset/db_engine_specs/base.py", line 1594, in execute
superset     cursor.execute(query)
superset   File "/usr/local/lib/python3.9/site-packages/trino/dbapi.py", line 592, in execute
superset     self._iterator = iter(self._query.execute())
superset   File "/usr/local/lib/python3.9/site-packages/trino/client.py", line 810, in execute
superset     self._result.rows += self.fetch()
superset   File "/usr/local/lib/python3.9/site-packages/trino/client.py", line 830, in fetch
superset     status = self._request.process(response)
superset   File "/usr/local/lib/python3.9/site-packages/trino/client.py", line 609, in process
superset     raise self._process_error(response["error"], response.get("id"))
superset trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=COLUMN_NOT_FOUND, message="line 1:8: Column 'metadata.uuid' cannot be resolved", query_id=20240207_121856_11010_858tg)
superset 2024-02-07 12:18:56,581:WARNING:superset.connectors.sqla.models:Query SELECT "metadata.uuid" AS "metadata.uuid",
superset        COUNT(*) AS count
superset FROM raw_runtime_broker.intenthit
superset GROUP BY "metadata.uuid"
superset ORDER BY count DESC
superset LIMIT 1000 on schema raw_runtime_broker failed
superset Traceback (most recent call last):
superset   File "/app/superset/connectors/sqla/models.py", line 1795, in query
superset     df = self.database.get_df(sql, self.schema, mutator=assign_column_label)
superset   File "/app/superset/models/core.py", line 612, in get_df
superset     self.db_engine_spec.execute(cursor, sqls[-1])
superset   File "/app/superset/db_engine_specs/base.py", line 1596, in execute
superset     raise cls.get_dbapi_mapped_exception(ex) from ex
superset   File "/app/superset/db_engine_specs/base.py", line 1594, in execute
superset     cursor.execute(query)
superset   File "/usr/local/lib/python3.9/site-packages/trino/dbapi.py", line 592, in execute
superset     self._iterator = iter(self._query.execute())
superset   File "/usr/local/lib/python3.9/site-packages/trino/client.py", line 810, in execute
superset     self._result.rows += self.fetch()
superset   File "/usr/local/lib/python3.9/site-packages/trino/client.py", line 830, in fetch
superset     status = self._request.process(response)
superset   File "/usr/local/lib/python3.9/site-packages/trino/client.py", line 609, in process
superset     raise self._process_error(response["error"], response.get("id"))
superset trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=COLUMN_NOT_FOUND, message="line 1:8: Column 'metadata.uuid' cannot be resolved", query_id=20240207_121856_11010_858tg)
superset 127.0.0.6 - - [07/Feb/2024:12:18:56 +0000] "POST /api/v1/chart/data HTTP/1.1" 400 73 "https://superset.factory-staging.scottytechnologies.com/explore/

Screenshots

The chart evidence and error

Screenshot_20240207_115708 Screenshot_20240207_115728

The dashboard filter evidence and error

Screenshot_20240207_121441 Screenshot_20240207_121455

SQL Lab evidence on how SQL looks and should look like

This is what Superset currently generates in charts and filters Screenshot_20240207_115741

Evidence on what works and what doesn't work with Trino Screenshot_20240207_115846

Superset version

3.1.0

Python version

3.9

Node version

Not applicable

Browser

Firefox

Additional context

I have the following feature flags enabled:

FEATURE_FLAGS = {
      "ALERT_REPORTS": True,
      "DASHBOARD_RBAC": True,
      "ENABLE_TEMPLATE_PROCESSING": True,
    }

There are no customization made on Superset, but the container image is packaged by me in order to include necessary dependencies like the Trino connector (python lib: trino==0.327.0). The Superset is deployed in Kubernetes using the official Superset helm chart.

About the data source I am querying, I've already provided details on it on the post.

Checklist

giftig commented 4 months ago

This is an interesting bug, because I was aware of the quoting issue you've described happening in the original PRESTO_EXPAND_DATA feature in version <= 2.0 and made a point to make sure these fields were correctly quoted when writing the new expansion feature; you can see the logic of that here and I called it out here. Tests for this here.

It's been working for us with deeply nested fields and many different charts, but we've backported the feature to 2.1 so it's possible it's interacting with something else that went into 3.1 and this logic is being skipped in some contexts.

I'm happy to give you a hand understanding the flow of some of the logic here if you want to try digging deeper into how this query was generated. Feel free to reach out on slack.

giftig commented 4 months ago

For a primer on the flow here, you can see the base spec's select_star calls database.get_columns, which then calls back to self.db_engine_spec.get_columns. It then uses cls._get_fields to get the SQLA field definitions and builds the select star query from them.

get_columns on the trino spec contains a call to self._expand_columns linked above, which should correctly quote the fields and list them on the column defs as query_as. _get_fields on the base spec will take query_as as a literal column if it exists.

So at first glance I don't see an obvious problem where something's changed since the original design, but perhaps some part of this logic is being bypassed in your example.

rusackas commented 4 weeks ago

Closing this as stale, assuming you've just moved on with corrected quotes, and most others aren't running into this problem. If it's still a bug in current Superset and/or people want to dig deeper, just say the word and we can re-open this.

ruifpedro commented 4 weeks ago

This is still an issue, specifically in charts and dashboards. In SQL Lab we can easily mitigate the problem but not in other parts of Superset.

momhiar commented 2 weeks ago

it is because apache supeset queries that recived by Trino calls nested fields in a string which is not supported by Trino for example: "info.operator" is not valid while info.operator is valid, is there anything to work around?