Kotlin / kotlin-spark-api

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Apache License 2.0
459 stars 35 forks source link

[Feat] Adds Dataset selectTyped functions #88

Closed Jolanrensen closed 3 years ago

Jolanrensen commented 3 years ago

As discussed in the issue: https://github.com/JetBrains/kotlin-spark-api/issues/85 This adds the selectTyped() functions for Datasets. The results are mapped to Pairs, Triples or Aritys for the respective amount of arguments given.

The creation of TypedColumns is made a lot easier using a different pull request: https://github.com/JetBrains/kotlin-spark-api/pull/87, which I recommend to merge first, so that the temporary col() function in the tests here can be removed before merging :).

Note that on 2.4 the selectTyped() function cannot return a Dataset of a data class of an array. This is an encoding limitation which I mention here https://github.com/JetBrains/kotlin-spark-api/issues/64 as well.

asm0dey commented 3 years ago

Now it's time to update this PR: remove obsolete functions and fix conflicts

Jolanrensen commented 3 years ago

@asm0dey Alright! done

Jolanrensen commented 3 years ago

@asm0dey If you have time to look at it :) The main branch currently references selectTyped() in the readme, but this pull request is not yet merged.