Open maropu opened 3 years ago
TPC-DS schemas are different between spark-sql-perf TPCDSTables and spark-master/branch-3.1 TPCDSBase (string v.s. char/varchar). For example;
spark-sql-perf
spark-master/branch-3.1
// spark "reason" -> """ |`r_reason_sk` INT, |`r_reason_id` CHAR(16), |`r_reason_desc` CHAR(100) """.stripMargin, // spark-sql-perf Table("reason", partitionColumns = Nil, 'r_reason_sk .int, 'r_reason_id .string, 'r_reason_desc .string),
To generated TPCDS table data for Spark (master/branch-3.1), it would be nice to use CHAR/VARCHAR types in TPCDSTables.
TPCDSTables
NOTE: This ticket comes from https://github.com/apache/spark/pull/31886
https://github.com/databricks/spark-sql-perf/pull/201
Is there a specific reason that this schema was created in the first place rather then using the schema mentioned in the tpc org documentation? http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.1.0.pdf
TPC-DS schemas are different between
spark-sql-perf
TPCDSTables andspark-master/branch-3.1
TPCDSBase (string v.s. char/varchar). For example;To generated TPCDS table data for Spark (master/branch-3.1), it would be nice to use CHAR/VARCHAR types in
TPCDSTables
.NOTE: This ticket comes from https://github.com/apache/spark/pull/31886