databricks / spark-sql-perf

Apache License 2.0
586 stars 407 forks source link

exception when running tables.genData() #56

Closed alphalzh closed 8 years ago

alphalzh commented 8 years ago

Hi, Thanks for your devotion in developing this tool first. My environment: Cloudera CDH 5.6.0+spark 1.5.0 I encountered this error when I tried to generate data using tables.genData(): scala> tables.genData("/hdfs2/ds", "text" ,true ,true,true,true,true) java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2570) at java.lang.Class.getDeclaredMethod(Class.java:2002) at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1431) ...... previously I was just following the README.md, here's what I've done: import com.databricks.spark.sql.perf.tpcds.Tables import com.databricks.spark.sql.perf.tpcds.Tables scala> val tables = new Tables(sqlContext,"/root/tpc-ds/tools" , 50) tables: com.databricks.spark.sql.perf.tpcds.Tables = com.databricks.spark.sql.perf.tpcds.Tables@194765c4 I would like to know what is possibly going wrong here? Is it because that I use cloudera distribution of spark or I was using version 1.5.0 which is not supported? Thanks for your help in advance.

alphalzh commented 8 years ago

ok, so I found this: https://github.com/apache/spark/pull/11443 It seems that spark has migrated the old DataFrame class to Dataset. Might due to my spark version.