Open Jolanrensen opened 3 years ago
@Jolanrensen would you be able to fix the conflicts?
@asm0dey It's nowhere near finished though. I'm having second thoughts about the scale of the standard library though. Maybe it's a bit too much to add everything for Spark and we need to take a look at what is helpful and what isn't.
IMHO it is, yes
As discussed in the issue https://github.com/JetBrains/kotlin-spark-api/issues/100 it would be nice to have more stdlib functions to work with Datasets too, since it's one of Kotlin's selling points.
I've started converting the _Collections.kt from the stdlib to Dataset and I've managed to get to about a third, to
filterIndexed
.It already contains a lot of helpful functions, like
last()
,firstOrNull {}
,drop()
,all {}
etc, but there are many left to do. Many are faster but prone to out of memory issues when first converted to an Iterable. This holds for functions likefirst {}
etc. I plan to have a code inspection plugin hint the user in these cases.It's nowhere near done, but since I'm going away for a couple of weeks, I thought it might be cool to share the functions I already created so they can be tested already and maybe encompassed in the API itself. Of course, feel free to continue my work in my absence. Many stdlib functions are still left and the RDD could also use them ;).