Open hawkaa opened 1 year ago
That theory might definitely be correct. I've heard about Databricks storing Datasets differently as well (stricter than spark itself in some cases). Since we replace some code inside the Spark library itself we need an exact match to be able to call the code. I don't know much about Databricks though, do they have a published package of their spark version? Or can you somehow specify which Spark package to use?
"Databricks continues to develop and release features to Apache Spark. The Databricks Runtime includes additional optimizations and proprietary features that build upon and extend Apache Spark, including [Photon], an optimized version of Apache Spark rewritten in C++." right... Omg, they indeed have a LOT of changes done to Spark: https://docs.databricks.com/release-notes/runtime/11.3.html#apache-spark no wonder it doesn't work...
Also, I see they use Scala 2.12.14, we use 2.12.16. Might be worth it to try a ./gradlew -Pspark=3.3.0 -Pscala=2.12.14 clean publishMavenPublicationToMavenLocal
from the kotlin api sources just to be sure it's not that.
So I've sent an email to Databricks with this issue. Not sure if they can or want to help, but it can't hurt to try, can it? :)
So I've sent an email to Databricks with this issue. Not sure if they can or want to help, but it can't hurt to try, can it? :)
Thanks a ton! I'm also in touch with them. Really appreciate it!
So I've sent an email to Databricks with this issue. Not sure if they can or want to help, but it can't hurt to try, can it? :)
Thanks a ton! I'm also in touch with them. Really appreciate it!
The more people who ask for it, the better :)
Hi @Jolanrensen ,
I'm sorry toss these issues/questions at you without being able to provide much of assistance myself. I really appreciate you taking your time to help out.
We're in the progress of updating our code to Spark 3.3.0, but we're seeing some errors. We're not experiencing the errors locally in our unit tests, but in Databricks and their runtime 11.2. I can confirm that that runtime uses Spark 3.3.0, Scala 2.12 which is also what we use together with
org.jetbrains.kotlinx.spark:kotlin-spark-api_3.3.0_2.12:1.2.1
. My theory is that Databricks doesn't use the open source version of spark, hence we get the following error message:Any thoughts on whether that theory might be correct? As far as I know, these errors normally is a consequence of mismatching versions. Could it be that the databricks runtime has some custom spark code where deserializerForWithNullSafetyAndUpcast doesn't exist or have a different signature?
Thanks again. Much appreciated!