Open jlowe opened 9 months ago
We may be able to do the metadata detection much cheaper by checking rootPaths
on the FileIndex rather than inputFiles
which probably would avoid doing anything really expensive. I suspect we'll see the special metadata directories in the rootPaths
results on metadata queries without needing a full file listing, but this needs to be verified.
isDeltaLakeMetadataQuery
can invokeinputFiles
on a FileSourceScanExec's relation, and on highly partitioned data sources this will often trigger a Spark job to do the listing of files in the table. Users have seen extra stages to do file listings appear that have been triggered byisDeltaLakeMetadataQuery
. Settingspark.rapids.sql.detectDeltaLogQueries
tofalse
causes these extra stages to disappear.