almond-sh / almond

A Scala kernel for Jupyter
https://almond.sh
BSD 3-Clause "New" or "Revised" License
1.58k stars 241 forks source link

Recent versions of Spark not supported ? (ClassCastException error) #1334

Open AWebNagra opened 3 months ago

AWebNagra commented 3 months ago

Hello,

We recently tried using almond as a scala kernel within our jupyterlab environment but we are encountering errors when trying to use recent versions of spark.

Spark versions tested : 3.5.0 Scala version : 2.13.8 (scala version used in spark 3.5.0) Java version : 17.0.2 Almond versions (tried) : 0.13.14 and 0.14.0-RC13

The errors arise when trying to send code to the execs, hence inducing serialization and deserialization. Other operations work fine (count, show, etc..)

Here's a minimalistic example causing the error :

import org.apache.spark.rdd.RDD
val rdd:RDD[Int] = spark.sparkContext.parallelize(List(1,2,3,4,5))
val multipliedRDD = rdd.map(_ * 2)
println(multipliedRDD.collect().mkString(", "))

and the error is :

java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD

Note that running the exact same code in a spark-shell on the jupyterlab instance works fine, hence the problem must come from almond. Our best guess is that the classpath used by almond imports mismatched versions of some libs, but we don't have any proof that this could be the issue

Second note : we tried using our own spark installed in the jupyterlab image AND installing spark directly with ivy from a scala notebook, both induce the same error.

Does anyone have any idea what could be causing this issue ?

coreyoconnor commented 2 months ago

Be sure the spark version is the scala 2.13 version?

I can confirm spark 3.5.1 and almond 0.14.0-RC14 work fine:

image

Specific setup: https://github.com/coreyoconnor/nix_configs/blob/dev/modules/ufo-k8s/almond-2/Dockerfile