Open ruifpedro opened 4 months ago
This is an interesting bug, because I was aware of the quoting issue you've described happening in the original PRESTO_EXPAND_DATA
feature in version <= 2.0 and made a point to make sure these fields were correctly quoted when writing the new expansion feature; you can see the logic of that here and I called it out here. Tests for this here.
It's been working for us with deeply nested fields and many different charts, but we've backported the feature to 2.1 so it's possible it's interacting with something else that went into 3.1 and this logic is being skipped in some contexts.
I'm happy to give you a hand understanding the flow of some of the logic here if you want to try digging deeper into how this query was generated. Feel free to reach out on slack.
For a primer on the flow here, you can see the base spec's select_star
calls database.get_columns
, which then calls back to self.db_engine_spec.get_columns
. It then uses cls._get_fields
to get the SQLA field definitions and builds the select star query from them.
get_columns
on the trino spec contains a call to self._expand_columns
linked above, which should correctly quote the fields and list them on the column defs as query_as
. _get_fields
on the base spec will take query_as
as a literal column if it exists.
So at first glance I don't see an obvious problem where something's changed since the original design, but perhaps some part of this logic is being bypassed in your example.
Closing this as stale, assuming you've just moved on with corrected quotes, and most others aren't running into this problem. If it's still a bug in current Superset and/or people want to dig deeper, just say the word and we can re-open this.
This is still an issue, specifically in charts and dashboards. In SQL Lab we can easily mitigate the problem but not in other parts of Superset.
it is because apache supeset queries that recived by Trino calls nested fields in a string which is not supported by Trino
for example: "info.operator"
is not valid while info.operator
is valid, is there anything to work around?
Bug description
Heya Superset team!
I ran into a bug that prevents me from using nested columns in charts and especially dashboard filters. This happens with datasets that use a Trino database with the
Enable row expansion in schemas
enabled (introduced in Superset 3.1.0), which, as the name states, expands nested columns and effectively adds them to the dataset columns list.This happens due to a possible problem with Superset's query builder, where a given nested column, let's say
metadata.field_a
is parsed into the query as"metadata.field_a"
, while Trino only accepts either"metadata"."field_a"
ormetadata.field_a
. This causes the column selection in chart exploration to create invalid queries (although it can be mitigated by always using Custom SQL). Additionally, this completely breaks the usage of dashboard filters, where the custom SQL workaround is not possible.I provide evidence below and am available to provide more details if needed.
Looking forward towards your response.
How to reproduce the bug
In chart exploration:
Cannot load filter Error: line 1:8: Column '<nested_column_name>' cannot be resolved
In dashboard filters:
Cannot load filter Error: line 1:8: Column '<nested_column_name>' cannot be resolved
Screenshots/recordings
Stacktrace
Screenshots
The chart evidence and error
The dashboard filter evidence and error
SQL Lab evidence on how SQL looks and should look like
This is what Superset currently generates in charts and filters![Screenshot_20240207_115741](https://github.com/apache/superset/assets/13442135/b77c9e02-a773-47a7-9a99-269a98837ccc)
Evidence on what works and what doesn't work with Trino![Screenshot_20240207_115846](https://github.com/apache/superset/assets/13442135/9df8cba6-4471-4553-9232-9f894484ce09)
Superset version
3.1.0
Python version
3.9
Node version
Not applicable
Browser
Firefox
Additional context
I have the following feature flags enabled:
There are no customization made on Superset, but the container image is packaged by me in order to include necessary dependencies like the Trino connector (python lib: trino==0.327.0). The Superset is deployed in Kubernetes using the official Superset helm chart.
About the data source I am querying, I've already provided details on it on the post.
Checklist