Kotlin / kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Apache License 2.0
456 stars 34 forks source link

Tuple first #144

Closed Jolanrensen closed 2 years ago

Jolanrensen commented 2 years ago

Implements some version of my old library: https://github.com/Jolanrensen/ScalaTuplesInKotlin/tree/main

There is some speed loss when working with functions that prefer Arities and Pairs over Tuples since those are what Spark is optimized for. Putting Tuples first will thus probably lower the confusion of users and increase performance :)

Sort of follow up from https://github.com/JetBrains/kotlin-spark-api/issues/76

Jolanrensen commented 2 years ago

Maybe takeN(), dropN() (and takeLastN(), dropLastN()) and splitAsN() could also be added, since they are added in Scala 3, similar to zip. map {} might also be a simple one.

Not sure if this is becoming overkill or still useful XD

asm0dey commented 2 years ago

LGTM, but I cant say that I've carefully reviews everything generated

Jolanrensen commented 2 years ago

LGTM, but I cant say that I've carefully reviews everything generated

I'll just check how performance of IntelliJ is while working on something else when having these tuples present in the project. If that's fine I'll merge it :)

asm0dey commented 2 years ago

Nice plan!

Jolanrensen commented 2 years ago

Performance is way better in a separate module :) tried it with the streaming functions. Consecutive builds are way faster than full ones. The only downside is that now dokka won't include the tuples. We could publish it to a separate docs branch while we haven't switched to gradle yet? I thought gradle did support multi-module dokka