Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
768 stars 48 forks source link

Comparing two data frame #658

Open brindasanth opened 2 months ago

brindasanth commented 2 months ago

I have two data frames having same same schema, Is there way to compare the two data frames ? so that it provide the added , deleted and modified rows. It may take some single/group of Key columns and Ignore columns.

Jolanrensen commented 2 months ago

Hi! We don't have such functionality at the moment, but it might be a handy addition.

Tracking additions, deletions, and modifications, similar to how git would do it, requires a special algorithm. I suppose Myer's Differencing Algorithm could help.

I just tried this algorithm via https://github.com/andrewbailey/Difference on two dataFrames (as List<DataRow<*>>) which correctly provides the remove/move/add operations that likely occurred between the two dataframes.

We could wrap a library like that in the future to introduce this behavior to DataFrame natively, but in the meantime, you could try that library as well :)

brindasanth commented 2 months ago

Thanks for your comments and adding in Backlog.