apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
823 stars 163 forks source link

[comet-parquet-exec] Handle CometNativeScan RDD when DataSourceRDD instead of FileScanRDD #1088

Closed mbutrovich closed 6 days ago

mbutrovich commented 1 week ago

I think when prefetching is enabled (which is probably a setting not relevant to ParquetExec anyway we end up with a DataSourceRDD in the scan instead of a FileScanRDD. This extracts the partitions from that RDD type.

Total number of tests run: 754
Suites: completed 32, aborted 0
Tests: succeeded 609, failed 145, canceled 1, ignored 46, pending 0