Open iMajna opened 7 years ago
@iMajna
This library is not updated for Spark2. It was built with 1.5.2
https://github.com/KeithSSmith/spark-compaction/blob/master/pom.xml#L148
DataFrameReader.parquet(String string)
was added in Spark 2.
Yes, I got that thats why I asked can we expect any improvement :) Do you know any replacement or how to deal with lots of parquets?
I was looking at this to handle Avro.
The solution would be to fork it and rewrite all the RDD usage into the Dataset API
This error occured while Iwas trying to comapact all snappy.parquet files which were generated in Spark2.1 with DataFrames. Is there any work around? Maybe to try with RDDs but how eficient it is Can we expect improvement in near future? :)
Sent from my Lge LG-H850 using FastHub