Open jshumway-zs opened 9 months ago
Hello, I am facing the same issue, were you able to resolve this?
Hello, I am facing the same issue, were you able to resolve this?
Hey @OmarSultan85, for now we have just forked and removed the most recent partition logic as it isn't critical for us. I've found that it also affects bigquery tables accessed through trino and is still an issue in version 3.0.0 as well.
@jshumway-zs Can you please share which files need to be modified and which function to be removed to make it work. In my case superset 3.0 is adding record_count, and file_count to where clause for every iceberg table. What it does is read the metadata information from partition file and then use values from this file to append as where clause.
@jshumway-zs Can you please share which files need to be modified and which function to be removed to make it work. In my case superset 3.0 is adding record_count, and file_count to where clause for every iceberg table. What it does is read the metadata information from partition file and then use values from this file to append as where clause.
Here is a snippet from our dockerfile which we are using to correct. Basically we return early in presto.py on line 524 and then remove lines 73-92 in trino.py. Obviously the line numbers will depend on the release. Its been a while since I looked at this, so I'm not sure which methods are being modified here, but if you look a the code from the 3.0.0 branch you should get an idea.
FROM apache/superset:3.0.0rc4
USER root
RUN sed -i "524i \ return None" /app/superset/db_engine_specs/presto.py
RUN sed -i 73,92d /app/superset/db_engine_specs/trino.py
USER superset
Just looked back at the code and it looks like we are exiting early from PrestoBaseEngineSpec._latest_partition_from_df and making sure metadata['partitions'] isn't getting set in TrinoEngineSpec.extra_table_metadata.
+1 we are facing the same issue here. Superset 3.1.1 and Iceberg tables in Trino.
Pinging some users/SMEs from the rolodex on this one, since a solution might actually close TWO issues: @john-bodley @sujiplr @nytai
This is still the case on latest release (4.0.1). As described by @jshumway-zs, the Trino db engine extends from the Presto one, that has the method where_latest_partition
defined that the preview fuction uses to make sure you only preview data from the latest partition.
This method adds a WHERE clause using all the columns/values from the return None
).
New offsets for the same solution to disable partitioning tracking for 4.0.1:
FROM apache/superset:4.0.1
USER root
RUN sed -i "494i \ return None" /app/superset/db_engine_specs/presto.py
RUN sed -i 81,99d /app/superset/db_engine_specs/trino.py
USER superset
A clear and concise description of what the bug is.
Data preview for trino tables does not work for all trino connectors (specifically for iceberg in our case). This issue results because the response from "select * from table$partitions" is dependent on the connection type (this is used in superset/db_engine_specs/presto.py). It seems that the code was written based off of the response for a hive connection, but the response schema is different for other connection types (see the iceberg docs here for example).
How to reproduce the bug
Expected results
The preview should run properly and display the data preview
Actual results
The preview fails and an error message is shown
Screenshots
Environment
(please complete the following information):
Additional context
I have fixed this on our end by checking if the returned partition columns match the schema described in the iceberg documentation and if so, I return None. However this doesn't allow us to see the most recent partition and is also specific to iceberg. I would recommend setting latest_partition to False in the select_star method for trino until a better solution is found, as selecting from the most recent partition isn't critical in most cases.