databricks / spark-sql-perf

Apache License 2.0
582 stars 406 forks source link

Is Hive a prerequisite? #14

Open hansbogert opened 9 years ago

hansbogert commented 9 years ago

Throughout the code I get the feeling that a pre-installed Hive installation is needed. Is this correct? Because when writing to (external table) I can see the spark driver assumes the destination is a Hive database.

If it is indeed needed, that should be important to add in the README.md.

yhuai commented 9 years ago

@hansbogert To use spark-sql-perf, you only need Spark and TPC-DS's tool-kit. A pre-installed Hive is not needed. But, you probably want to build spark with -Phive profile to add Hive as a dependency. Then you can use HiveContext that has a parser with better SQL coverage and metastore support. For the method of createExternalTable, it uses Hive metastore to persist metadata (you can just use the built-in derby metastore).