Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
835 stars 63 forks source link

Add primitive arrays column wrappers #30

Open altavir opened 3 years ago

altavir commented 3 years ago

Primitive array columns are required for optimized big-data applications. It is also possible to add numerical DataFrame integration with MultiK or KMath.

nikitinas commented 2 years ago

Our current idea is to use Arrow as a backend for primitive types. See https://github.com/Kotlin/dataframe/issues/78

altavir commented 2 years ago

It is a great idea, but it will be worth it only in terms of interop with other platforms. For JVM-only, Arrow will give nothing new.

nikitinas commented 2 years ago

Arrow should give significant performance increase for JVM due to nullable values types support. Current implementation generates quite a lot of boxing/unboxing. It can be solved without Arrow, but I expect Arrow implementation to be faster. We will do performance benchmarks before implementation.

And we need to support Arrow I/O anyway.

Jolanrensen commented 5 months ago

I was experimenting with asList() wrappers. Maybe this could solve this long-standing issue: https://github.com/Kotlin/dataframe/compare/master...primitive-array-value-columns but it needs more testing.