Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
784 stars 50 forks source link

Is there a solution for lead/lag/shift of values on a column? #522

Open tklinchik opened 8 months ago

tklinchik commented 8 months ago

I'd like to lead/lag/shift values within a groupBy. I can't find any examples or API for that and therefore wanted to raise a question. I would expect a syntax to be similar to following (assuming the data set with incidents:

df.groupBy("Date").aggregate {
    lag("IncidentTime", 1) into "PrevIncidentTime"
}
koperagen commented 8 months ago

Hm, for now i can recommend this approach

df.groupBy("Date").aggregate {
    "IncidentTime" into "IncidentTime"
}.add("PrevIncidentTime") { prev()?.getValue<LocalDateTime>("IncidentTime") }

What about offset, do you need something other than 1 previous / next? Would be interesting to know or have references so that we can add / extend API

taras-hillsidetec commented 8 months ago

Adding this capability would be great. I've used Pandas heavily in the past and as I'm making a transition to Kotlin data frame API few things are missing and this is one of the more involved ones. In Pandas they have shift(periods) that can be positive or negative and other APIs I've used have lead or lag options.

Here is a Pandas example with a group by and a shift but I assume in Kotlin data frame API it might be more natural to express shift within aggregate lambda.

df['prev_value'] = df.groupby('object')['value'].shift(5)