Open altavir opened 3 years ago
Our current idea is to use Arrow as a backend for primitive types. See https://github.com/Kotlin/dataframe/issues/78
It is a great idea, but it will be worth it only in terms of interop with other platforms. For JVM-only, Arrow will give nothing new.
Arrow should give significant performance increase for JVM due to nullable values types support. Current implementation generates quite a lot of boxing/unboxing. It can be solved without Arrow, but I expect Arrow implementation to be faster. We will do performance benchmarks before implementation.
And we need to support Arrow I/O anyway.
I was experimenting with asList()
wrappers. Maybe this could solve this long-standing issue:
https://github.com/Kotlin/dataframe/compare/master...primitive-array-value-columns
but it needs more testing.
Primitive array columns are required for optimized big-data applications. It is also possible to add numerical DataFrame integration with MultiK or KMath.