Closed cmelchior closed 3 weeks ago
This is a very interesting bug!
@Test
fun `describe twice 1`() {
val df = dataFrameOf("a", "b")(1, 2, 3, 4)
val desc1 = df.describe()
val desc2 = desc1.describe()
desc2::class shouldBe DataFrameImpl::class
}
works fine, but
@Test
fun `describe twice 2`() {
val df = dataFrameOf("a", "b")(1, "foo", 3, "bar")
val desc1 = df.describe()
val desc2 = desc1.describe()
desc2::class shouldBe DataFrameImpl::class
}
breaks.
I suspect this is due to columns being created with type Comparable<*>
/Comparable<Nothing>
after running describe()
:
name:String type:Any count:Int unique:Int nulls:Int top:Comparable<*> freq:Int mean:Double? std:Double? min:Comparable<*> median:Comparable<*> max:Comparable<*>
0 a Int 2 2 0 1 1 2.0 1.414214 1 2 3
1 b String 2 2 0 foo 1 null null bar bar foo
If you now run another describe()
on this table, it will try to find the min of columns like top
and compare Int
and String
. However, these two are incomparable, as we can see by the exception.
Our current implementation only checks if a column AnyCol.isComparable() = isSubtypeOf<Comparable<*>?>()
, not whether the type T != Nothing
. I'm not sure we can actually.
Our current implementation only checks if a column AnyCol.isComparable() = isSubtypeOf<Comparable<*>?>(), not whether the type T != Nothing. I'm not sure we can actually.
Maybe typeOf<Comparable<Any?>?>()
will work as expected here. I suspect this code was written with an assumption that means Any?, but Comparable has in variance and `Comparable<>==
Comparableand from type system perspective you can't compare two
Comparable
@koperagen thanks for the tip! But unfortunately:
typeOf<Int>().isSubtypeOf(typeOf<Comparable<Any?>>()) == false
typeOf<Int>().isSubtypeOf(typeOf<Comparable<Any>>()) == false
variance is fun :)
It can be fixed like:
/**
* Returns `true` if [this] column is comparable, i.e. its type is a subtype of [Comparable] and its
* type argument is not [Nothing].
*/
public fun AnyCol.isComparable(): Boolean = isSubtypeOf<Comparable<*>?>()
&& type().projectTo(Comparable::class).arguments[0].let {
it != KTypeProjection.STAR &&
it.type?.isNothing != true
}
I'll probably make a PR later :)
Ran into this by accident while testing AI Assistant.
It looks like the output of
df.describe()
creates a DataFrame with some invalid types. I managed to reproduce crashes in two cases:I also reproduced this in a unit test:
Which crashed with: