KeithSSmith / spark-compaction

File compaction tool that runs on top of the Spark framework.
Apache License 2.0
59 stars 40 forks source link

NoSuchMethod DataFrameReader.parquet #1

Open iMajna opened 7 years ago

iMajna commented 7 years ago

This error occured while Iwas trying to comapact all snappy.parquet files which were generated in Spark2.1 with DataFrames. Is there any work around? Maybe to try with RDDs but how eficient it is Can we expect improvement in near future? :)

17/08/16 10:56:33 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrameReader.parquet([Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame;
java.lang.NoSuchMethodError: org.apache.spark.sql.DataFrameReader.parquet([Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame;
    at com.github.KeithSSmith.spark_compaction.Compact.compact(Compact.java:210)
    at com.github.KeithSSmith.spark_compaction.Compact.main(Compact.java:227)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)

Sent from my Lge LG-H850 using FastHub

OneCricketeer commented 6 years ago

@iMajna

This library is not updated for Spark2. It was built with 1.5.2

https://github.com/KeithSSmith/spark-compaction/blob/master/pom.xml#L148

DataFrameReader.parquet(String string) was added in Spark 2.

iMajna commented 6 years ago

Yes, I got that thats why I asked can we expect any improvement :) Do you know any replacement or how to deal with lots of parquets?

OneCricketeer commented 6 years ago

I was looking at this to handle Avro.

The solution would be to fork it and rewrite all the RDD usage into the Dataset API