Closed Jolanrensen closed 3 years ago
@asm0dey How do you feel about these additions? :)
Actually, slight improvement. I made a separate col()
function as well, not tied to a certain Dataset
just like in the org.apache.spark.sql.functions
. However, since using reflection we now know the type and TypedDataset
inherits from Dataset
, the function returns a TypedDataset
.
In short:
val dataset: Dataset<YourClass> = ...
val new: Dataset<Tuple2<TypeOfA, TypeOfB>> = dataset.select( col(YourClass::a), col(YourClass::b) )
This works
(The method will probably be selectTyped
and return Pair
s soon in a separate pull request: https://github.com/JetBrains/kotlin-spark-api/pull/88)
Could you please update README for this in PR too? I'm not sure it's correct to deprecate backticked functions: they cost us nothing and may be more readable/usable for somebody.
Aldo Kotlin docs should definitely contain their Scala counterparts documented
refs #54
@asm0dey shall I add the other functions with backticks as well? It feels wrong to just have those two.
@Jolanrensen I think yes, @Meosit clearly stated that they think they may be useful, I think so too.
@asm0dey ==
calls $eq$eq$eq
, so it should be ===
right? (it's called equalTo()
in Java, different to equals()
which uses 2 =
s.
Yep, triple-eq definitely should be ===
Yep, triple-eq definitely should be
===
@asm0dey Alright, I'll change that. Unfortunately `>`
etc are not allowed since it's an illegal character :(. That can only be gt
.
Can we use them without backticks, or Column
already has compareTo
method?
Can we use them without backticks, or
Column
already hascompareTo
method?
@asm0dey Nope, because compareTo
needs to return an Int
and in the code something < otherThing
will always return a Boolean
. It cannot return a Column
like in Scala.
Then we definitely have no choice which is good and bad at the same time :)
@asm0dey `||`
is highlighted as "Name contains characters which can cause problems on Windows: |"... I'm not sure if this only becomes a problem when calling the function or just having it there.
Won't compile on Windows hosts, so let's not risk
@asm0dey alright, is there something I missed?
Reviewing it right now + waiting for tests to complete
Thank you for your effort!
While looking at the
select()
functions, I found a couple of ways to improve the API:Firstly, similar to the Scala API, a
Column
can now be created from a dataset usingdataset("yourColumn")
now, as well as the usualdataset.col("yourColumn")
.Secondly, I walked over all
Column
operator functions in the Scala API and looked at which could be ported over to Kotlin operator or infix funs (<
and>=
etc. won't work, due to Kotlin limitations). I tried to avoid `backtick` names as they are not that easy to work with as well as staying as close to the names already in the API (so I deprecated==
and&&
). I'll provide a quick Scala to Kotlin guide below:Let me know what you think!
Finally, to prepare for the
select()
functions, I looked at theTypedColumn
functions. One of these can now be created from a normalColumn
using a newas()
function:But also, to get a
TypedColumn
specific to a certain type ofDataset
(which is needed for theselect()
function), I added a bit of Kotlin reflection into the mix:I also added an invoke-variant of this, so when calling
select()
on aDataset
you can now do:and it will return the right kind of typed
Dataset
:).