Kotlin / kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Apache License 2.0
456 stars 34 forks source link

Spark 3.2.0 support #117

Closed Jolanrensen closed 2 years ago

Jolanrensen commented 2 years ago

Updated to Spark 3.2.0 and Scala 2.12.15. Tests seem to work

asm0dey commented 2 years ago

@Jolanrensen thank you! Does this version work with older Spark?

Jolanrensen commented 2 years ago

@asm0dey I'm not sure! I haven't tested that yet. I simply updated the versions and fixed like one override. We don't have any tests that try this, maybe you could look into that?

Although, not much has changed regarding the API, especially since all the tests we do have passed immediately, so I think it might be fine.

asm0dey commented 2 years ago

I thought about such tests, but it's something non-trivial to implement because it will force us to have somehow tested with different Spark versions.

Jolanrensen commented 2 years ago

Yeah exactly. But how would you want to test it then? Use the API with spark 3.2.0 as library in a project that used 3.0.0 before?

asm0dey commented 2 years ago

Set up a new project built with kotlin-spark built against spark 3.2 and add an explicit dependency on Spark 3.0 to it, I think.

Jolanrensen commented 2 years ago

Ah, that seems to give errors. Like Exception in thread "main" java.lang.NoSuchMethodError: 'scala.collection.Seq org.apache.spark.sql.catalyst.expressions.objects.Invoke$.apply$default$5()'. This means that the API needs to have the same version of Spark as the end-user will use.

Edit: actually, let me check if the scala version is the same (2.12.15, I was using 14 still) Edit2: No, it's definitely the Spark version. If I make my project target 3.2.0 it works again.

asm0dey commented 2 years ago

And this automatically means we need a separate spin-off for 3.2 :(

Jolanrensen commented 2 years ago

If you still want to support 3.0.0, then yes

asm0dey commented 2 years ago

What errors does it give when you use current version with 3.2?

Jolanrensen commented 2 years ago

I get the same error as when mismatching the versions the other way around. I also read complaints on Slack that it was broken. I'm pretty sure that a cross-version solution is not the best way to go, though. So I'd say, drop 3.0 support for the next version, since 3.2.0 is the last 3.x Spark version anyways.

What I would do is always target the latest version, which now is 3.2, and then one LTS version which currently is 2.4 (according to Wikipedia). That way, the users get pushed to update their stuff, haha.

asm0dey commented 2 years ago

Sounds reasonable, but it requires some further thinking about the process. Should we bump the major version because it's backward incompatible? And also… what if we find a bug affecting both 3.0 and 3.2 versions? It sounds like we should somehow release the support branch.

Jolanrensen commented 2 years ago

Hmm maybe we can make it 1.1.0? As Spark also only changed the second number.

Hmm, yeah if you want to have a support branch for 3.0 still, you can create that, but then it also needs to be maintained when we find a bug. The project, however, isn't very active all the time, so I think only supporting the latest version would be enough. That is, unless there's still a lot of requests for a 3.0 version.

It's up to you, it's your project after all :).

asm0dey commented 2 years ago

Maybe you're right. It's interesting that folks from Spark broke backward compatibility in a minor release. Maybe we should really bump the version to 1.1.0 and support the old version in the 1.0 branch.

asm0dey commented 2 years ago

Possibly we should create branch like spark-3.2 here, bump there version and add compatibility with spark 3.2 in this branch. And do releases from there too.

Jolanrensen commented 2 years ago

I would choose the option that is the least work for the future so that we can make updates faster when a new scala- or spark update comes around :)

asm0dey commented 2 years ago

The next Scala is 3 and I think that it'll be a huge piece of work to support it…

Jolanrensen commented 2 years ago

Not sure how quickly spark is updated to scala 3 however. I was more aiming towards another scala 2.x.x version.

But I'd say spark 3.2 support should be released as soon as possible anyways as there are people now dropping Kotlin because they cannot use the API on 3.2 yet :/ So we should either drop support for 3.0 or keep it on a separate branch.

asm0dey commented 2 years ago

@Jolanrensen may I ask you please to do the following: change your fork to work with the spark-3.2 branch, remove support for spark 2 there and change the code to work with 3.2

Jolanrensen commented 2 years ago

Sure, this will also remove scala 2.11 then right? Since that only works with Spark 2

Do you also want to bump to kotlin 1.6.10? or should we do that next

Jolanrensen commented 2 years ago

https://github.com/Jolanrensen/kotlin-spark-api/tree/spark-3.2 Alright, here is my branch based on your spark-3.2 brach. Old Scala and Spark 2 is removed and everything is 3.2 now.

asm0dey commented 2 years ago

I think you can just create aPR against my branch!

Jolanrensen commented 2 years ago

Alright, moving to https://github.com/JetBrains/kotlin-spark-api/pull/118