Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
800 stars 55 forks source link

`NullPointerException` using `except` to exclude double nested column #761

Open Jolanrensen opened 2 months ago

Jolanrensen commented 2 months ago

To reproduce:

val df = dataFrameOf("a.b", "a.c.d", "d.e", "d.f")(1, 3.0, 2, "b")
    .move { all() }.into { it.name.split(".").toPath() }
a d
{"b": 1, "c": {"d": 3}} {"e": 2, "f": "b"}

This works:

df.select { cols(a) except a.b }

This breaks:

df.select { cols(a) except a.c.d }
java.lang.NullPointerException
    at org.jetbrains.kotlinx.dataframe.impl.columns.UtilsKt$allColumnsExceptKeepingStructure$2.invoke(Utils.kt:454)
    at org.jetbrains.kotlinx.dataframe.impl.columns.UtilsKt$allColumnsExceptKeepingStructure$2.invoke(Utils.kt:454)
    at org.jetbrains.kotlinx.dataframe.api.ReplaceKt.with(replace.kt:53)
    at org.jetbrains.kotlinx.dataframe.impl.columns.UtilsKt.allColumnsExceptKeepingStructure(Utils.kt:454)
    at org.jetbrains.kotlinx.dataframe.impl.columns.UtilsKt.allColumnsExceptKeepingStructure$default(Utils.kt:412)
    at org.jetbrains.kotlinx.dataframe.api.AllExceptKt$exceptInternal$1.invoke(allExcept.kt:1193)
    at org.jetbrains.kotlinx.dataframe.api.AllExceptKt$exceptInternal$1.invoke(allExcept.kt:1190)
    at org.jetbrains.kotlinx.dataframe.impl.columns.ConstructorsKt$createColumnSet$1.resolve(constructors.kt:154)
    at org.jetbrains.kotlinx.dataframe.impl.columns.UtilsKt.resolve(Utils.kt:480)
    at org.jetbrains.kotlinx.dataframe.impl.columns.ConstructorsKt$toColumnSet$1.invoke(constructors.kt:178)
    at org.jetbrains.kotlinx.dataframe.impl.columns.ConstructorsKt$toColumnSet$1.invoke(constructors.kt:175)
    at org.jetbrains.kotlinx.dataframe.impl.columns.ConstructorsKt$createColumnSet$1.resolve(constructors.kt:154)
    at org.jetbrains.kotlinx.dataframe.impl.columns.UtilsKt.resolve(Utils.kt:480)
    at org.jetbrains.kotlinx.dataframe.impl.UtilsKt.getColumnsWithPaths(Utils.kt:201)
    at org.jetbrains.kotlinx.dataframe.impl.UtilsKt.getColumnsImpl(Utils.kt:196)
    at org.jetbrains.kotlinx.dataframe.DataFrame$DefaultImpls.get(DataFrame.kt:78)
    at org.jetbrains.kotlinx.dataframe.impl.DataFrameImpl.get(DataFrameImpl.kt:32)
    at org.jetbrains.kotlinx.dataframe.api.SelectKt.select(select.kt:64)
    at Line_70_jupyter.<init>(Line_70.jupyter.kts:1)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
koperagen commented 2 months ago

Let's consider making except work only for selected columns and not their children

Jolanrensen commented 2 months ago

Yes, that might be a good idea. Many operations can be reproduced with remove + except. And this case is just an edge-case.

exceptNew does look promising, but we could reconsider it, as it's experimental anyway.

Eventually, we might want another way to be able to select a column and keep its parents structure (so select its parents, but delete its siblings). But we can sorta reproduce that already by doing something like:

df.getColumnsWithPaths { colsAtAnyDepth().colsOf<Int>() }
    .map { it.path to it }
    .toDataFrameFromPairs<Any?>()
Jolanrensen commented 1 month ago

Not sure if this is a good first issue. It may be a difficult one, but if someone's up for it, I'd be happy to help