dfdx / Spark.jl

Julia binding for Apache Spark
Other
204 stars 39 forks source link

Update build.jl with SCALA_BINARY #117

Closed Screenhandsaw closed 1 year ago

Screenhandsaw commented 1 year ago

Hello dfdx,

I came across quite a peculiar error when trying to submit jobs towards a standalone cluster in client mode. After a lot of testing I narrowed it down to incompatibility issues between Spark/Java/Scala versions (actual error was serialisation in RPC)

java.lang.RuntimeException: java.io.InvalidClassException: org.apache.spark.rpc.netty.RpcEndpointVerifier$CheckExistence; local class incompatible: stream classdesc serialVersionUID = 5378738997755484868, local class serialVersionUID = 7789290765573734431

when setting the SCALA_VERSION env-variable to e.g 2.12 or 2.12.17 (as required by latest spark-3.4.0 docker image) Maven breaks and cannot find the version requested. I managed to get everything working by manually setting the variables required in Spark.jl/deps/build.jl (that is patch version on the BINARY variable) and running the same code against the folder.

I therefore would like to submit this PR to be able to influence also the BUILD_SCALA_BINARY_VERSION via env variables to also influence this.

PS. Maybe also consider mentioning these env variables in the README/docs would be helpful because I only found them by looking through the code.