Kotlin / kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Apache License 2.0
455 stars 34 forks source link

Unable to initialize spark in Jupyter Notebook #205

Open mlcohen opened 11 months ago

mlcohen commented 11 months ago

Hi -- I've been attempting to get kotlin-spark to work in Jupyter Notebook (v7.0.2). Unfortunately every time I try to run the magic line %use spark in my Jupyter notebook (using the kotlin kernel [kotlin-jupyter-kernel]), I end up getting the following error:

received properties: Properties: {spark=3.3.1, scala=2.13, v=1.2.3, displayLimit=20, displayTruncate=30, spark.app.name=Jupyter, spark.master=local[*], spark.sql.codegen.wholeStage=false, fs.hdfs.impl=org.apache.hadoop.hdfs.DistributedFileSystem, fs.file.impl=org.apache.hadoop.fs.LocalFileSystem}, providing Spark with: {spark.app.name=Jupyter, spark.master=local[*], spark.sql.codegen.wholeStage=false, fs.hdfs.impl=org.apache.hadoop.hdfs.DistributedFileSystem, fs.file.impl=org.apache.hadoop.fs.LocalFileSystem}
23/08/10 10:46:58 INFO SparkContext: Running Spark version 3.3.1
23/08/10 10:46:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/08/10 10:46:58 INFO ResourceUtils: ==============================================================
23/08/10 10:46:58 INFO ResourceUtils: No custom resources configured for spark.driver.
23/08/10 10:46:58 INFO ResourceUtils: ==============================================================
23/08/10 10:46:58 INFO SparkContext: Submitted application: Jupyter
... [clipped log output for brevity]
The problem is found in one of the loaded libraries: check library init codes
org.jetbrains.kotlinx.jupyter.exceptions.ReplEvalRuntimeException: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x5d9e0fc3) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x5d9e0fc3
org.jetbrains.kotlinx.jupyter.exceptions.ReplLibraryException: The problem is found in one of the loaded libraries: check library init codes
    at org.jetbrains.kotlinx.jupyter.exceptions.ReplLibraryExceptionKt.rethrowAsLibraryException(ReplLibraryException.kt:32)
    at org.jetbrains.kotlinx.jupyter.repl.impl.CellExecutorImpl$ExecutionContext.doAddLibraries(CellExecutorImpl.kt:151)
... [clipped log output for brevity]
Caused by: org.jetbrains.kotlinx.jupyter.exceptions.ReplEvalRuntimeException: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x5d9e0fc3) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x5d9e0fc3
    at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl.eval(InternalEvaluatorImpl.kt:110)
... [clipped log output for brevity]
Caused by: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x5d9e0fc3) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x5d9e0fc3
    at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala:213)
    at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:114)
    at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:353)
    at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:290)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:339)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:279)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:464)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
    at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
    at scala.Option.getOrElse(Option.scala:201)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
    at Line_5_jupyter.<init>(Line_5.jupyter.kts:11)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
    at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.evalWithConfigAndOtherScriptsResults(BasicJvmScriptEvaluator.kt:105)
    at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke$suspendImpl(BasicJvmScriptEvaluator.kt:47)
    at kotlin.script.experimental.jvm.BasicJvmScriptEvaluator.invoke(BasicJvmScriptEvaluator.kt)
    at kotlin.script.experimental.jvm.BasicJvmReplEvaluator.eval(BasicJvmReplEvaluator.kt:49)
    at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl$eval$resultWithDiagnostics$1.invokeSuspend(InternalEvaluatorImpl.kt:103)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
    at kotlinx.coroutines.EventLoopImplBase.processNextEvent(EventLoop.common.kt:284)
    at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:85)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
    at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source)
    at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
    at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source)
    at org.jetbrains.kotlinx.jupyter.repl.impl.InternalEvaluatorImpl.eval(InternalEvaluatorImpl.kt:103)
    ... 50 more

I'm running spark locally; no remote cluster setup. Any ideas what I might be doing wrong?

Other details:

Jolanrensen commented 11 months ago

Seems specific to your system, I cannot reproduce it. Are you able to run a normal spark project? Without notebooks?

mdsadiqueinam commented 1 month ago

I am also having saving issue @Jolanrensen the above issue is occurring in gradle 8.4 and above

mdsadiqueinam commented 1 month ago

@mlcohen did you find any solution?

Jolanrensen commented 1 month ago

Could you try a lower java version? I know Spark can be difficult with java 17+, like mentioned here https://stackoverflow.com/questions/72724816/running-unit-tests-with-spark-3-3-0-on-java-17-fails-with-illegalaccesserror-cl

mdsadiqueinam commented 1 month ago

I tried with java 8 as well 11 but same issue

mdsadiqueinam commented 1 month ago

@Jolanrensen thanks for the solution the issue has been fixed, but I need a little help, please suggest me way to use spark in ktor server, thank you

Jolanrensen commented 1 month ago

Unfortunately I have no experience with Ktor+Spark, so I won't be able to help. Maybe someone else could though :)

mdsadiqueinam commented 1 month ago

Unfortunately I have no experience with Ktor+Spark, so I won't be able to help. Maybe someone else could though :)

No problem I think I got it how to use it