Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
796 stars 54 forks source link

Move Jupyter integration in new dataframe-jupyter module #775

Open koperagen opened 1 month ago

koperagen commented 1 month ago

New dataframe-jupyter module should be added here in dependencies https://github.com/Kotlin/kotlin-jupyter-libraries/blob/master/dataframe.json However, after kernel is updated in the Kotlin Notebook to use this descriptor (for example in 2024.2) %use dataframe(0.13.1) will produce an error image

So we either need to

  1. rollback the change (but still likely will have to do it in the future)
  2. depend on dataframe-jupyter from dataframe, but it requires JVM 11 bump
  3. publish empty kotlin-jupyter for 0.13.1, 0.12.0 and some other versions. So it will be resolved and cause no errors. To do this we need to create a branch from commit before the refactoring, create an empty module, then create a few branches from it and set version= in gradle.properties. Then execute publishDataframe-jupyterPublicationToMavenRepository on teamcity
koperagen commented 1 month ago

Even bigger problem is that on older kernels %use dataframe(0.14.0) will resolve only "dataframe", without jupyter integration. So https://github.com/Kotlin/dataframe/pull/776. And these should be re-open https://github.com/Kotlin/dataframe/issues/241 https://github.com/Kotlin/dataframe/issues/700

koperagen commented 1 month ago

Even bigger problem is that on older kernels %use dataframe(0.14.0) will resolve only "dataframe", without jupyter integration.

There can be a migration period when both core module and dataframe-jupyter apply the same integration (just making sure to do it once)

Jolanrensen commented 1 month ago

We could also tweak the .json descriptor file such that it resolves a different set of dependencies depending on the version.

https://github.com/Kotlin/kotlin-jupyter/blob/master/docs/libraries.md#descriptor-api-3

It's not the neatest, but in theory it could make sure the dataframe-jupyter dependency is added only when the version of dataframe is 0.14.0+. Something like:

{
  "init": [
    "if ($v >= 0.14.0) USE { dependencies(\"...dataframe-jupyter:$v\") }"
  ]
}

(not sure if we can get the $v, but there's probably a way)

Alternatively, we could just prompt users to update their kernel versions with: https://github.com/Kotlin/kotlin-jupyter/blob/master/docs/libraries.md#minimal-kernel-version-supported-by-the-library

@ileasile What do you think is the wisest solution?

Jolanrensen commented 2 weeks ago

Nice! Adding something like this works and could actually solve the issue I think :)

  "init": [
    "val (major, minor, patch) = \"$v\".split('.').map { it.filter { it.isDigit() }.toInt() }",
    "if (minor >= 14) { USE { dependencies(\"org.jetbrains.kotlinx:dataframe-jupyter:$v\") }; println(\"adding jupyter!\") } else { println(\"not adding jupyter!\") }"
  ],
  "dependencies": [
    "org.jetbrains.kotlinx:dataframe:$v"
  ],

By default, the dataframe module is added as dependency, this is fine. However, we can make it so that for version X or higher, it automatically pulls the dataframe-jupyter module as well :)

@koperagen what was the friend module for btw? It seems to work fine if :dataframe-jupyter just depends on :core