databricks / spark-sql-perf

Apache License 2.0
586 stars 406 forks source link

TPC-DS.. dataGen.. format #68

Open kgebaly opened 8 years ago

kgebaly commented 8 years ago

table.genData(tableLocation, format, overwrite, clusterByPartitionColumns, What value does format take when generating TPC-DS benchmarks?

npaluskar commented 8 years ago

format is for type of data. So it has to mentioned as a string thats what i have found out in Tables.scala def genData( location: String, format: String, overwrite: Boolean, clusterByPartitionColumns: Boolean, filterOutNullPartitionValues: Boolean, numPartitions: Int)

e.g "text" so you can give something like tables.genData("/path/to_Data", "text", true, true, true, true, true)

sridharpothamsetti commented 8 years ago

we can use parquet/avro etc. I tried with parquet.