Kotlin / kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Apache License 2.0
459 stars 35 forks source link

Questions regarding scala.ScalaReflectionException #194

Open RuizhuYang opened 1 year ago

RuizhuYang commented 1 year ago

We're building spark application on AWS EMR service, and seeing error when running the application:

Exception in thread "main" scala.ScalaReflectionException: object org.apache.spark.sql.KotlinReflection not found.
    at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:185)
    at scala.reflect.internal.Mirrors$RootsBase.staticModule(Mirrors.scala:29)
    at org.apache.spark.sql.KotlinReflection$$typecreator1$6.apply(KotlinReflection.scala:777)
    at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:237)
    at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:237)
    at org.apache.spark.sql.KotlinReflection.localTypeOf(KotlinReflection.scala:1404)
    at org.apache.spark.sql.KotlinReflection.localTypeOf$(KotlinReflection.scala:1402)
    at org.apache.spark.sql.KotlinReflection$.localTypeOf(KotlinReflection.scala:53)
    at org.apache.spark.sql.KotlinReflection$.$anonfun$serializerFor$1(KotlinReflection.scala:777)
    at scala.reflect.internal.tpe.TypeConstraints$UndoLog.undo(TypeConstraints.scala:73)
    at org.apache.spark.sql.KotlinReflection.cleanUpReflectionObjects(KotlinReflection.scala:1389)
    at org.apache.spark.sql.KotlinReflection.cleanUpReflectionObjects$(KotlinReflection.scala:1388)
    at org.apache.spark.sql.KotlinReflection$.cleanUpReflectionObjects(KotlinReflection.scala:53)
    at org.apache.spark.sql.KotlinReflection$.serializerFor(KotlinReflection.scala:694)
    at org.apache.spark.sql.KotlinReflection$.serializerFor(KotlinReflection.scala:681)
    at org.apache.spark.sql.KotlinReflection.serializerFor(KotlinReflection.scala)
    at org.jetbrains.kotlinx.spark.api.EncodingKt.kotlinClassEncoder(Encoding.kt:165)
    at org.jetbrains.kotlinx.spark.api.EncodingKt.generateEncoder(Encoding.kt:144)

I can confirm that that runtime uses Spark 3.3.0, Scala 2.12, and we use the matching version of koltin-spark-api org.jetbrains.kotlinx.spark:kotlin-spark-api_3.3.0_2.12:1.2.1 and org.jetbrains.kotlinx.spark:core-3.3.0_2.12:1.2.1

This seems like an issue of missing dependency. Any thoughts on which dependency we should check?

Thanks!

Jolanrensen commented 1 year ago

Hi! I don't think an explicit dependency on the "core" module is needed as "kotlin-spark-api" already provides it as an "api" dependency. Could you try it without? Also, currently, we're at version 1.2.3. Just to be safe, could you upgrade?

RuizhuYang commented 1 year ago

Hi @Jolanrensen,

Thanks for you quick response!

I have tried to update org.jetbrains.kotlinx.spark:kotlin-spark-api_3.3.0_2.12:1.2.1 to org.jetbrains.kotlinx.spark:kotlin-spark-api_3.3.0_2.12:1.2.3, but we still get the same error.

Can we know where this class is defined? I cannot find it in any of the dependency package.

Also, when you mentioned "kotlin-spark-api" alreadyprovides it as an "api" dependency, what is the difference between an API dependency and the module dependency.

Thanks!

Jolanrensen commented 1 year ago

The specific class that is missing for you is in the core module. But it should be included in your build nonetheless, so I think it's something specific to Amazon's version of Spark that's causing the issue. Do you have an overview of your other dependencies exactly? Or is that hidden by AWS? Otherwise I could try and reproduce it.

A (module) dependency can, among others, consist of "implementation" and "api". Implementation will include the dependency only to this package. Api will do that too but also expose all the methods of the dependency to users of this package, essentially forwarding them. This means that you should have access to the KotlinReflection class from the core module if you just have kotlin-spark-api as a dependency. https://docs.gradle.org/current/userguide/java_library_plugin.html#sec:java_library_separation

RuizhuYang commented 1 year ago

Hi @Jolanrensen,

After talking with AWS EMR team, we found out that it is an issue for their latest release EMR 6.9.0 version. There are some issues for class loader compatibility. After we down grade to EMR 6.8.0 version, this got resolved.

Thanks so much for your help!

jhardin commented 2 weeks ago

@RuizhuYang We're facing similar ScalaReflectionException issues in later versions of EMR (but with our own case classes) when running JREs other than 1.8 (our application works fine with 1.8 JRE). Do you know if your issue has been resolved in versions of EMR > emr-6.8.0?