Closed wjxiz1992 closed 2 weeks ago
cc @winningsix for visibility. This is helpful for our high priority task to add more query benchmarks.
Investigated this on one of the internal tickets.
The tools does not truncate the schema. It is truncated by Spark internal classes.
The AQE updates the PlanInfo replacing the old planInfo which contains the full metadata.schema
with an empty metadata/truncated schema field values.
@wjxiz1992 in the future please be sure to add information about what tool you are requesting functionality from (profiling/qualification/other). If possible add details about how you are running the tool and reproduce case. Also please add why this is high priority - ie what its going to be used for.
@wjxiz1992 in the future please be sure to add information about what tool you are requesting functionality from (profiling/qualification/other). If possible add details about how you are running the tool and reproduce case. Also please add why this is high priority - ie what its going to be used for.
Sure thanks for the suggestion. Updated the issue description.
@wjxiz1992
This problem seems to be more complicated than initially thought.
Since Spark truncates the metadata in the new AdaptivePlan, the full schema will be be missing for those SQLPlans. It is even more difficult considering that nodeNames might change, and that sqlMetrics need to be mapped correctly.
I am working on it.
Is your feature request related to a problem? Please describe. We are generating data according to the information provided in
data_source_information
. Current table doesn't contain a column that shows the full table name of the data source. Thus we have to go to the web UI to check the full table name.Requiring functionality from: Profiling Tool
Reproduece step: Run profiling tool against one eventlog file
Why this is high priority: (not sure if I could tell here, let me know if I should delete it) We got 30 queries from customer, we need to run them locally to test our software(spark-rapids). Currently we go to web UI or evenlog to get the table names. It would help if it's directly shown in the Profiling Tool output csv.
Describe the solution you'd like Add a column to show the full table name
Describe alternatives you've considered None
Additional context None