Open NvTimLiu opened 1 year ago
Some more details on the failures:
Affects test_[csv|orc|parquet]_scan_with_hidden_metadata_fallback
pyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `_metadata`.`file_path` cannot be resolved.
Did you mean one of the following? [`_c0`].; line 1 pos 0; 'Project [_c0#16965, '_metadata.file_path]
Possibly related to changes in Spark 3.5.0 in https://github.com/apache/spark/commit/3baf7f7b7106f3fd30257b793ff4908d0f1ec427
Affects a number of fallback tests, such as test_csv_datetime_parsing_fallback_cpu_fallback
Test is expecting FileSourceScanExec
but finds v2.BatchScanExec
java.lang.AssertionError: assertion failed: Could not find GpuFileGpuScan parquet .* ReadSchema: struct<> in the Spark plan
E GpuColumnarToRow false
E +- GpuHashAggregate(keys=[], functions=[gpucount(1, false)], output=[count(1)#115634L])
E +- GpuShuffleCoalesce 1073741824
E +- GpuColumnarExchange gpusinglepartitioning$(), ENSURE_REQUIREMENTS, [plan_id=213897]
E +- GpuHashAggregate(keys=[], functions=[partial_gpucount(1, false)], output=[count#115637L])
E +- GpuBatchScan parquet hdfs://ip-172-31-0-176.us-west-2.compute.internal:8020/tmp/pyspark_tests/ip-172-31-8-237-main-840-848679351/PARQUET_DATA[] GpuParquetScan DataFilters: [], Format: gpuparquet, Location: InMemoryFileIndex(1 paths)[hdfs://ip-172-31-0-176.us-west-2.compute.internal:8020/tmp/pyspark_tes..., PartitionFilters: [], ReadSchema: struct<>, PushedFilters: [] RuntimeFilters: []
E
Affects test_read_*
tests
org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: com.nvidia.spark.rapids.tests.datasourcev2.parquet.ArrowColumnarDataSourceV2. Please find packages at `https://spark.apache.org/third-party-projects.html`.
pyspark.errors.exceptions.captured.AnalysisException: [CANNOT_LOAD_FUNCTION_CLASS] Cannot load class com.nvidia.spark.rapids.tests.udf.hive.EmptyHiveSimpleUDF when registering the function `emptysimple`, please make sure it is on the classpath.
Describe the bug Python integration tests failed on latest EMR
6.12.0
cluster [spark-rapidsv23.06.0
jar special for EMR] , FAILED files:FAILED test cases:consoleText2.txt