Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
811 stars 57 forks source link

Support Multiplatform #24

Open ileasile opened 3 years ago

ileasile commented 3 years ago

Most of the library code is common, exceptions are IO parts and Jupyter integration. We may support KMP (at least K/JS) for this library

Jolanrensen commented 1 year ago

There's probably too much JVM reflection going on for this to be easy, let alone viable :/

icecreamparlor commented 1 year ago

Any Updates ?

Jolanrensen commented 1 year ago

@icecreamparlor Nope. While it would be cool, there are just so many JVM dependencies in the project right now, so while in theory, it should be possible, it would be a huge undertaking.

If performance would be the reason to go multiplatform, I think we still have a lot to gain when the Vector API hits the JVM eventually, plus we have plans to convert our Lists to primitive arrays eventually https://github.com/Kotlin/dataframe/issues/30.

If, aside from performance, there are other needs for multiplatform support, I would be interested in seeing a proof of concept of (part of) the API, so we can then properly decide whether it would be worth the effort or not.

devcrocod commented 9 months ago
devcrocod commented 7 months ago

Doubts arose about multiplatform support in dataframe, because the library uses a lot of reflections.

I've looked into this a bit, most of the reflection we use is in common. Therefore, if there are problems, then in isolated cases https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.reflect/

Jolanrensen commented 7 months ago

If you look at implementation files like this: https://github.com/Kotlin/dataframe/blob/master/core/src/main/kotlin/org/jetbrains/kotlinx/dataframe/impl/TypeUtils.kt you can see we use jvmErasure a lot all over the place. I'm not sure if there's a common alternative for that. This needs to be checked.

devcrocod commented 7 months ago

In fact, this is the only thing I noticed that is strongly tied to the platform.

In this case, for jvm everything will be the same, for native it will be possible to come up with a workaround, with wasmJs I’m not sure if it’s worth supporting at all

koperagen commented 7 months ago

Gradle & KSP plugins need to be tested in multiplatform projects when the library is ready

  1. ImportDataSchema annotation usage
  2. dataframes { } Gradle configutation usage Should be a big deal, so treat is as a note for future for testing purposes. Not a blocker or anything
devcrocod commented 7 months ago

I took a closer look at multiplatform support and conducted some experiments with it. Initially, I made some erroneous conclusions.

Here are the issues I discovered:

This is used in methods, for example: convert, update, join, aggregate, and others. But it's also used in TypeUtils, and methods from there are called when creating new columns, that is, practically with any operation. There's no simple replacement or implementation through expect/actual for the same jvmErasure in Kotlin/Native, so a full refactoring of this logic is required. In some cases, I assume the use of reflection is excessive, and in the case of type erasure, I see the following. The simplest way is to go through the data in Kotlin/Native and calculate the type, but this carries very large overheads. Another option is to calculate the type when the data comes from outside, that is, when creating a dataframe, keep it all the time and reuse it constantly, as it does not always happen now. When calculating a new type during operations with the dataframe, a resolver is necessary, and this will require implementing quite complex logic. Also, I assume that some problems with reflection will be solved with the help of a compiler plugin.

As a result, I do not see multiplatform support with Kotlin/Native (ios) as feasible in the near future, as it requires a lot of effort, which presumably could be solved by Kotlin itself in the future. Multiplatform support only for JVM and Android seems like a more realistic task but will require:

Jolanrensen commented 6 months ago

Reflekt can do a tiny piece of JVM reflection: finding classes/interfaces in the project, using IR. So finding which supertypes exist etc. should in theory be possible.

Kaverit can also do some type logic.