Closed Kopilov closed 5 months ago
Merge branch 'master' into NullVector
Should we apply git rebase
instead (to avoid spaghetti-like history)?
Rebased
Thanks for the help! I'll run the CI and merge :)
@Kopilov Looks like the test org.jetbrains.kotlinx.dataframe.io.ArrowKtTest.testReadingAllTypesAsEstimatedNotNullableWithNulls
now fails:
org.junit.ComparisonFailure: expected:<kotlin.Nothing?> but was:<kotlin.Nothing> expected:<kotlin.Nothing[?]> but was:<kotlin.Nothing[]>
at org.jetbrains.kotlinx.dataframe.io.ExampleEstimatesAssertionsKt.assertEstimations(exampleEstimatesAssertions.kt:163)
at org.jetbrains.kotlinx.dataframe.io.ArrowKtTest.testReadingAllTypesAsEstimatedNotNullableWithNulls(ArrowKtTest.kt:221)
This probably means we need my entire solution with NullabilityOptions
after all... The tests are fine if I use this:
...
@JvmName("withTypeNullableNothingList")
private fun List<Nothing?>.withTypeNullable(
expectedNulls: Boolean,
nullabilityOptions: NullabilityOptions,
): Pair<List<Nothing?>, KType> {
val nullable = nullabilityOptions.applyNullability(this, expectedNulls)
return this to nothingType(nullable)
}
and then
is NullVector -> vector.values(range).withTypeNullable(field.isNullable, nullability)
@Jolanrensen applied, thanks
Apache Arrow files might contain
NullVector
values (as result of saving null-infilled column in other libraries and languages without static types and target schema). From this PR they will be correctly read by Kotlin DataFrame instead of crushing. Also we can make saving toNullVector
s, should we?Among others, Arrow itself is upgraded to last stable version (14.0.2) and #428 problem is fixed for Arrow writing by replacing original
hasNulls
function with custom explicit checking.