[x] Old partitions created before columns are appended to the table. New columns should have nulls in the old partitions.
[x] Old partitions created before columns are added in the middle of the table. New columns should have nulls in the old partitions. (Not supported by Spark V1 datasource)
[x] Old partitions created before columns are dropped at the end of the table. New columns should have nulls in the old partitions. (Not suported by Spark V1 datasource)
[x] Old partitions created before columns are dropped in the middle of the table. New columns should have nulls in the old partitions. (Not supported by Spark V1 datasource)
[x] Old partitions created before columns are renamed. (Not supported by Spark V1 datasource)
[x] Old partitions created before new columns are added with the names of previously dropped columns. New columns should have nulls in the old partitions. (Not supported by Spark V1 datasource)
[x] #8797 SPARK-5309: ORC STRING column uses dictionary compression. OrcQuerySuite.scala#359. (Note: Not sure how this test is verifying dictionary_v2).
[x] #8731 ORC reads at scale with all null values: Like OrcQuerySuite.scala#L173, but with large number of rows.
[x] #8781 SPARK-16610: Honour orc.compress on writes, when compress is unset: OrcQuerySuite.scala#L189.
[x] #8782 compress should be honoured when set (ZLIB, Snappy, None). Refer to OrcQuerySuite.scala#L224.
[x] #8809 Test directory encoding in ORC file for nested types with lots of rows
[x] #8793 SPARK-9170: Upper case ORC column names are not implicitly stored in lowercase. Refer to OrcQuerySuite.scala:371
[x] #8838
[x] #8840
[ ] #8841
[x] Selecting complex fields: (Sampling from SchemaPruningSuite.)
[x] #8712
[x] #8713
[x] #8714
[x] #8715
[x] #8856 ORC version V_0_11 and V_0_12 are readable by plugin/CUDF. Refer to TestNewIntegerEncoding#testBasicOld.
[x] #8823 Test predicate pushdown (PPD) with timestamps, decimals, booleans, etc. Refer to OrcQuerySuite.scala#L464.
Here is a list of tests to confirm data format compatibility with Apache Spark, in the Spark RAPIDS plugin. This list is a work in progress:
Orc:
OrcQuerySuite.scala#L173
, but with large number of rows.Parquet:
[x] #8693
[x] #8694
[x] #8708
[x] #8704 Support for Parquet columns with names containing dot('.')
[x] Selecting complex fields: (Sampling from SchemaPruningSuite.)
[x] #9094
[x] #9127
[ ] P3: #9074 (No reference Spark tests. Not sure how this is tested.)
[ ] Support parquet.block.size to control row group size for parquet #9126
[x] https://github.com/NVIDIA/spark-rapids/issues/8762
[x] Add test to verify the fallback for UDT for parquet, refer to link1 and link2 #9137
[x] Test compatibility between pyarrow and GPU
[x] Test compatibility between fastparquet and GPU#9550
[ ] P1: Written/read by Spark and read/written by GPU. Generate files in parquet testing cases
[ ] P1: Written/read by Hive and read/written by GPU. Generate files in parquet testing cases
[x] #9151
[ ] https://github.com/NVIDIA/spark-rapids/issues/8692
[ ] https://github.com/NVIDIA/spark-rapids/issues/8693
[ ] https://github.com/NVIDIA/spark-rapids/issues/8694
[x] #8708
[x] #8730
[x] #8731
[ ] https://github.com/NVIDIA/spark-rapids/issues/8823
[ ] https://github.com/NVIDIA/spark-rapids/issues/9215