Closed LIN-Yu-Ting closed 1 year ago
Thanks for the detailed report, @LIN-Yu-Ting! I can reproduce the issue, looking into it now.
The problem occurs because the RAPIDS Accelerator does not have a specific override for DeltaParquetFileFormat which implements the column mapping mode on reads. DeltaParquetFileFormat derives from Apache Spark's ParquetFileFormat, and the RAPIDS Accelerator incorrectly thinks it can implement the behavior since DeltaParquetFileFormat is an instance of ParquetFileFormat.
The plugin will need to recognize DeltaParquetFileFormat directly and replace it with equivalent functionality for the GPU that will implement the column mapping feature.
@jlowe Thanks for your investigation of this issue. Is this implementation to DeltaParquetFileFormat difficult ? And do you have a plan to override reading function of Spark Rapids for DeltaParquetFileFormat ? If yes, can I expect that this function will be included in the next release 23.10 ?
I'm working on the fix for this now, hope to have a PR up soon to have this fixed in 23.10.
OK, thanks for your efforts. If you don't mind, please let me know your branch of development. I will also be interested at how to fix this kind of issue. Thanks.
@LIN-Yu-Ting I just posted #9279 to fix this problem. The fix involved updating the CPU format checks, so it includes a number of changes not specific to Delta Lake support.
Describe the bug We would like to apply Spark-Rapids (23.08) plugin in Spark 3.3.0 environment in order to accelerate execution of SparkSQL with DeltaTable (2.3.0). However, we just discovered that Spark Rapids is not able to obtain data from a DeltaTable with following TBLPROPERTIES
Steps/Code to reproduce bug Please provide a list of steps or a code sample to reproduce the issue. You can create a DeltaTable with commands
Expected behavior A clear and concise description of what you expected to happen. Once you create the delta table, you can use the following command to reproduce the error.
Then, you might obtain results such as
Environment details (please complete the following information)
Additional context Add any other context about the problem here.