Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
847 stars 64 forks source link

Misleading exception message in "sum" aggregator (when aggregating no columns) #352

Open koperagen opened 1 year ago

koperagen commented 1 year ago

we have an example in readme with pivot { destination }.sum { recentDelays.colsOf<Int?>() } into "total delays to" but changing the type to colsOf<Int> leads to

Sum is not supported for kotlin.Number
java.lang.IllegalArgumentException: Sum is not supported for kotlin.Number

When in fact, all direct Number inheritors are supported, it's just nullable types are not. Maybe we can support them, maybe just change the error message

koperagen commented 1 year ago

Actually, my explanation is wrong. Int? is supported by sum aggregation. Something is just wrong with the exception

Jolanrensen commented 1 week ago

I suspect this is because recentDelays is a column group with only Int? columns, so you try to sum no columns. Sum can only accept Numbers, so it assumes you gave it 0 Number columns to sum, then the exception occurs... So yes, sum should support Number columns, but it should also give a proper error when no columns are selected to sum.

Can be reproduced with emptyDataFrame<Any>().sum { none().cast() } and even emptyDataFrame<Any>().sum { none().cast<Int>() }.

I would probably solve this by adding a case to aggregateAll for when the user has given an empty columnSet, but maybe we can make the exception a bit more "personalized" to the individual aggregation.

Jolanrensen commented 1 week ago

https://github.com/Kotlin/dataframe/pull/937 will remove the exception, but not solve the issue. The emptyDataFrame<Any>().sum { none().cast<Int>() } will now be 0.0 (and a classCastException). There's still the odd switch to Number for providing an empty columnset to sum and https://github.com/Kotlin/dataframe/pull/937 makes Double the default for Number columns.