Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
784 stars 50 forks source link

Fix performance problem in `rename` implementation #532

Closed nikitinas closed 7 months ago

nikitinas commented 7 months ago

Currently DataColumn object is used as a key for associateBy in rename implementation. It leads to computation of rolling hash function and scanning through all column data. This problem can be seen in stack trace for https://github.com/Kotlin/dataframe/issues/526

Instead, a column path should be used as column id.

Jolanrensen commented 7 months ago

Thanks! Could you rebase on the master so that the TeamCity tests pass (includes https://github.com/Kotlin/dataframe/pull/535) and assemble the project to make sure the doc processor is run and the generated-sources are updated? Then I can merge it :)

nikitinas commented 7 months ago

Done