Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
818 stars 58 forks source link

`GroupBy.take()` and other missing functions #686

Open Jolanrensen opened 5 months ago

Jolanrensen commented 5 months ago

Currently, we can't do:

groupedDf
    .take(10)
    .concat()

to only concatenate the values of the first 10 groups. Instead, we'll have to convert to a normal DF first and convert back:

groupedDf
    .toDataFrame().take(10).asGroupBy()
    .concat()

The only row-based function that's available is filter(GroupedRowFilter) which can allow you to write .filter { it.index() <= 10 } but seems a bit odd.

Jolanrensen commented 5 months ago

Other missing functions include size(), drop(), first() etc. Maybe we could make it an AnyFrame or a DataColumn/BaseColumn<GroupedDataRow>

Jolanrensen commented 4 months ago

interestingly .filter {} runs on a GroupedRowFilter<T, G>, where T is the original DF type. This allows type-safe access to all key columns, but also to all non-key columns which don't exist in the GroupBy object, causing Exceptions... This might need a slight redesign.

koperagen commented 4 months ago

interestingly .filter {} runs on a GroupedRowFilter<T, G>, where T is the original DF type. This allows type-safe access to all key columns, but also to all non-key columns which don't exist in the GroupBy object, causing Exceptions... This might need a slight redesign.

https://github.com/Kotlin/dataframe/pull/663