Kotlin / kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Apache License 2.0
456 stars 34 forks source link

Better UDF support #152

Closed Jolanrensen closed 2 years ago

Jolanrensen commented 2 years ago

from issue: https://github.com/JetBrains/kotlin-spark-api/issues/143

One gotcha I noticed is that

val toNormalClass2 by udf.register { a: String, b: Int ->
    NormalClass(b, a)
}
shouldThrow<AnalysisException> { // toNormalClass2 is never accessed, so the delegate getValue function is not executed
    spark.sql("select toNormalClass2(first, second) from test2").show()
}

won't work since toNormalClass2 is never accessed, so the register function is not executed. A simple val a = toNormalClass2 already fixes this, but it's not optimal...

Jolanrensen commented 2 years ago

Maybe a vararg typed udf as final one?

Jolanrensen commented 2 years ago

To do: Finish tests, add examples, add in wiki/readme, probably merge udt first since column functions are changed here as well.

Jolanrensen commented 2 years ago

updated wiki https://github.com/Kotlin/kotlin-spark-api/wiki/UDF

Jolanrensen commented 2 years ago

@asm0dey whenever you have time to check it :)

asm0dey commented 2 years ago

It's definitely 1.2.0, not 1.1.1

Jolanrensen commented 2 years ago

For future reference, these files were used to generate the UDFs:

generateUDF.zip