[root@spark-3267648 spark-sql-perf]# build/sbt "test:runMain com.databricks.spark.sql.perf.tpcds.GenTPCDSData --help"
[info] Running com.databricks.spark.sql.perf.tpcds.GenTPCDSData --help
[info] Usage: Gen-TPC-DS-data [options]
[info]
[info] -m, --master <value> the Spark master to use, default to local[*]
[info] -d, --dsdgenDir <value> location of dsdgen
[info] -s, --scaleFactor <value>
[info] scaleFactor defines the size of the dataset to generate (in GB)
[info] -l, --location <value> root directory of location to create data in
[info] -f, --format <value> valid spark format, Parquet, ORC ...
[info] -i, --useDoubleForDecimal <value>
[info] true to replace DecimalType with DoubleType
[info] -e, --useStringForDate <value>
[info] true to replace DateType with StringType
[info] -o, --overwrite <value> overwrite the data that is already there
[info] -p, --partitionTables <value>
[info] create the partitioned fact tables
[info] -c, --clusterByPartitionColumns <value>
[info] shuffle to get partitions coalesced into single files
[info] -v, --filterOutNullPartitionValues <value>
[info] true to filter out the partition with NULL key value
[info] -t, --tableFilter <value>
[info] "" means generate all tables
[info] -n, --numPartitions <value>
[info] how many dsdgen partitions to run - number of input tasks.
[info] --help prints this usage text
How to use it: