RumbleDB / rumble

⛈️ RumbleDB 1.22.0 "Pyrenean oak" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
http://rumbledb.org/
Other
213 stars 82 forks source link

ANTLR runtime version collision #20

Closed ghislainfourny closed 6 years ago

ghislainfourny commented 6 years ago

Spark 2.1.1 uses 4.5.3, which leads to an error if using version 4.6 to generate the code. We need to either ask users to use the same version as Spark, or (better) to find a way to prioritize our version on the classpath.

wscsprint3r commented 6 years ago

We can bypass this issue by running:

spark-submit --class sparksoniq.ShellStart --conf spark.executor.extraClassPath=../lib/ --conf spark.driver.extraClassPath=../lib/ jsoniq-spark-app-0.9.2-jar-with-dependencies.jar --master local[2] --deploy-mode client

wscsprint3r commented 6 years ago

I can create a start script for Sparksoniq in order to make things easier.

ghislainfourny commented 6 years ago

Thanks for investigating. This is very useful to know. Yes, feel free to prepare a script that simplifies the launch of the shell. You could supply a script that launches the shell, and another script that sends a single query with output to HDFS.

This issue is also related to https://github.com/Sparksoniq/sparksoniq/issues/22 because, ideally, our jar does not even need to contain the ANTLR runtime, since Spark may already supply it on the cluster. This can save considerable space in our jar file, in addition to avoiding conflicts and simplifying execution.

wscsprint3r commented 6 years ago

Ok, I will look into #22 and mark this one as closed. How would you feel to switch from maven to Gradle build?