databricks / spark-sql-perf

Apache License 2.0
586 stars 406 forks source link

setup the benchmark #99

Open idragus opened 7 years ago

idragus commented 7 years ago

Hi,

I'm using spark 1.6.0 and I want to run the benchmark. However, I need first to setup the benchmark (I guess).

In the tutorial it's written that we have to execute those lines:

import com.databricks.spark.sql.perf.tpcds.Tables val tables = new Tables(sqlContext, dsdgenDir, scaleFactor) tables.genData(location, format, overwrite, partitionTables, useDoubleForDecimal, clusterByPartitionColumns, filterOutNullPartitionValues) // Create metastore tables in a specified database for your data. // Once tables are created, the current database will be switched to the specified database. tables.createExternalTables(location, format, databaseName, overwrite) // Or, if you want to create temporary tables tables.createTemporaryTables(location, format) // Setup TPC-DS experiment import com.databricks.spark.sql.perf.tpcds.TPCDS val tpcds = new TPCDS (sqlContext = sqlContext)

I understood that I have to run "spark-shell" first in order to run those lines, but the problem is that when i do "import com.databricks.spark.sql.perf.tpcds.Tables" I got an error " error: object sql is not a member of package com.databricks.spark". In "com.databricks.spark" there is only the "avro" package (I don't really know what it is)

Could you help me please, maybe I understood something wrong?

Thanks

jeevanks commented 7 years ago

Make sure you create a jar of spark-sql-perf (using sbt) . When starting spark-shell use the command --jars and point it to that jar. e.g., ./bin/spark-shell --jars /Users/xxx/yyy/zzz/spark-sql-perf/target/scala-2.11/spark-sql-perf_2.11-0.5.0-SNAPSHOT.jar

gdchaochao commented 5 years ago

I solve this problem with :require /path/to/file.jarin spark-shell

juliuszsompolski commented 5 years ago

@gdchaochao maybe using a absolute path in --jars would also solve it? In your previous comment you wrote that your command was spark-shell --conf spark.executor.cores=3 --conf spark.executor.memory=8g --conf spark.executor.memoryOverhead=2g --jars ./spark-perf/spark-sql-perf/target/scala-2.11/spark-sql-perf_2.11-0.5.1-SNAPSHOT.jar - with a relative path to the jar.