almond-sh / almond

A Scala kernel for Jupyter
https://almond.sh
BSD 3-Clause "New" or "Revised" License
1.59k stars 239 forks source link

Installed version is not aligned with dependencies #1321

Open djuarezg opened 7 months ago

djuarezg commented 7 months ago

I am installing Almond as in here

# Install Almond Scala kernel
RUN curl -Lo coursier https://git.io/coursier-cli && \
    chmod +x coursier && \
    ./coursier -J-Dhttps.proxyHost=<proxy> -J-Dhttps.proxyPort=3128 -J-Dhttp.proxyHost=<proxy> -J-Dhttp.proxyPort=3128 -J-Dhttp.nonProxyHosts=<non_proxy> -J-Dhttps.nonProxyHosts=<non_proxy> bootstrap --hybrid  almond:0.14.0-RC8 --scala 2.13.8 -o almond
RUN ./almond --install --force --jupyter-path="/opt/conda/share/jupyter/kernels"

Then, on my notebook that used to work with older versions I have the following:

import $ivy.`org.apache.spark::spark-sql:3.5.1`
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.13/3.5.1/spark-sql_2.13-3.5.1.pom
Downloading https://repo1.maven.org/maven2/sh/almond/almond-spark_2.13/0.14.0-RC3/almond-spark_2.13-0.14.0-RC3.pom
Downloaded https://repo1.maven.org/maven2/sh/almond/almond-spark_2.13/0.14.0-RC3/almond-spark_2.13-0.14.0-RC3.pom
(...)
import org.apache.spark.sql._
import $ivy.`sh.almond::almond-spark:0.14.0-RC8`

As you see it downloads RC3 related libraries instead of RC8.

And when running the session building this is again showing on the logs:

val spark = {
  NotebookSparkSession.builder()
    .appName("MvSparkNotebook")
    .master("spark://vmk-tdtspark-01:7070")
    .config("spark.cores.max", "1")
    .config("spark.executor.instances", "1")
    .config("spark.executor.cores", "1")
    .config("spark.executor.memory", "1g")
    .getOrCreate()
}
Downloading https://repo1.maven.org/maven2/sh/almond/spark-stubs_32_2.13/0.14.0-RC3/spark-stubs_32_2.13-0.14.0-RC3.pom
Downloaded https://repo1.maven.org/maven2/sh/almond/spark-stubs_32_2.13/0.14.0-RC3/spark-stubs_32_2.13-0.14.0-RC3.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.13/3.2.0/spark-sql_2.13-3.2.0.pom
Downloaded https://repo1.maven.org/maven2/org/apache/spark/spark-sql_2.13/3.2.0/spark-sql_2.13-3.2.0.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-parent_2.13/3.2.0/spark-parent_2.13-3.2.0.pom
Downloaded https://repo1.maven.org/maven2/org/apache/spark/spark-parent_2.13/3.2.0/spark-parent_2.13-3.2.0.pom
Downloading https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop/1.12.1/parquet-hadoop-1.12.1.pom
Downloading https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.3/scala-parallel-collections_2.13-1.0.3.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-sketch_2.13/3.2.0/spark-sketch_2.13-3.2.0.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-catalyst_2.13/3.2.0/spark-catalyst_2.13-3.2.0.pom
Downloading https://repo1.maven.org/maven2/org/apache/hive/hive-storage-api/2.7.2/hive-storage-api-2.7.2.pom
Downloading https://repo1.maven.org/maven2/org/apache/spark/spark-tags_2.13/3.2.0/spark-tags_2.13-3.2.0.pom
Downloaded https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.3/scala-parallel-collections_2.13-1.0.3.pom
(...)
Downloading https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.3/scala-parallel-collections_2.13-1.0.3.jar
Downloaded https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.20/commons-compress-1.20.jar
Downloading https://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.7.30/slf4j-log4j12-1.7.30.jar
Downloaded https://repo1.maven.org/maven2/org/glassfish/jersey/core/jersey-client/2.34/jersey-client-2.34-sources.jar
Downloading https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.12.3/jackson-databind-2.12.3-sources.jar
Downloaded https://repo1.maven.org/maven2/io/netty/netty-handler/4.1.50.Final/netty-handler-4.1.50.Final.jar
Downloading https://repo1.maven.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar
Downloaded https://repo1.maven.org/maven2/org/spark-project/spark/unused/1.0.0/unused-1.0.0.jar
Downloading https://repo1.maven.org/maven2/io/netty/netty-transport-native-epoll/4.1.50.Final/netty-transport-native-epoll-4.1.50.Final.jar
Downloaded https://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.7.30/slf4j-log4j12-1.7.30.jar
Downloading https://repo1.maven.org/maven2/io/dropwizard/metrics/metrics-json/4.2.0/metrics-json-4.2.0.jar
Downloaded https://repo1.maven.org/maven2/org/scala-lang/modules/scala-parallel-collections_2.13/1.0.3/scala-parallel-collections_2.13-1.0.3.jar

From this, you can see the same library is installed twice and then it uses the wrong one when opening the session with Spark master, causing a version mismatch and therefore failing the connection:

24/02/29 16:13:46 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.io.InvalidClassException: scala.concurrent.duration.FiniteDuration; local class incompatible: stream classdesc serialVersionUID = -6513803676778706429,
 local class serialVersionUID = -4594686286536372853

This means that the version I specify, RC8 is not respected. How do I enforce this?