Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
761 stars 48 forks source link

"Smart" column add in `DynamicDataFrameBuilder` #715

Open AndreiKingsley opened 4 weeks ago

AndreiKingsley commented 4 weeks ago

I can't rewrite this code with DynamicDataFrameBuilder: https://github.com/Kotlin/kandy/blob/2764bce7e9eec4888fdce71b306631cd4e0a208c/kandy-api/src/main/kotlin/org/jetbrains/kotlinx/kandy/dsl/internal/DatasetHandler.kt#L116 Here I want to do the following - if the column I want to add is already in the builder (with the same name and contains the same elements), it shouldn't be added again.

Jolanrensen commented 4 weeks ago

Indeed it might be worth it to add a contains(col): Boolean function to DynamicDataFrameBuilder.

Could you try whether this works?

public operator fun DynamicDataFrameBuilder.contains(col: AnyCol): Boolean =
    toDataFrame().getColumnOrNull(col) == col

Column equality in DataFrame is checked by this function:

internal fun <T> BaseColumn<T>.checkEquals(other: Any?): Boolean {
    if (this === other) return true

    if (this !is AnyCol) return false
    if (other !is AnyCol) return false

    if (name != other.name) return false
    if (type != other.type) return false
    return values.equalsByElement(other.values)
}

So it checks the name and values by default when you use colA == colB

AndreiKingsley commented 3 weeks ago

I guess it should work, but it's not efficient in terms of performance. So yeah, contains will work I suppose.

Jolanrensen commented 3 weeks ago

@AndreiKingsley you can make a PR that adds the function to DynamicDataFrameBuilder if you like :)

AndreiKingsley commented 3 weeks ago

Ok