Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
846 stars 63 forks source link

Move Jupyter integration in new dataframe-jupyter module #775

Open koperagen opened 4 months ago

koperagen commented 4 months ago

New dataframe-jupyter module should be added here in dependencies https://github.com/Kotlin/kotlin-jupyter-libraries/blob/master/dataframe.json However, after kernel is updated in the Kotlin Notebook to use this descriptor (for example in 2024.2) %use dataframe(0.13.1) will produce an error image

So we either need to

  1. rollback the change (but still likely will have to do it in the future)
  2. depend on dataframe-jupyter from dataframe, but it requires JVM 11 bump
  3. publish empty kotlin-jupyter for 0.13.1, 0.12.0 and some other versions. So it will be resolved and cause no errors. To do this we need to create a branch from commit before the refactoring, create an empty module, then create a few branches from it and set version= in gradle.properties. Then execute publishDataframe-jupyterPublicationToMavenRepository on teamcity
koperagen commented 4 months ago

Even bigger problem is that on older kernels %use dataframe(0.14.0) will resolve only "dataframe", without jupyter integration. So https://github.com/Kotlin/dataframe/pull/776. And these should be re-open https://github.com/Kotlin/dataframe/issues/241 https://github.com/Kotlin/dataframe/issues/700

koperagen commented 4 months ago

Even bigger problem is that on older kernels %use dataframe(0.14.0) will resolve only "dataframe", without jupyter integration.

There can be a migration period when both core module and dataframe-jupyter apply the same integration (just making sure to do it once)

Jolanrensen commented 4 months ago

We could also tweak the .json descriptor file such that it resolves a different set of dependencies depending on the version.

https://github.com/Kotlin/kotlin-jupyter/blob/master/docs/libraries.md#descriptor-api-3

It's not the neatest, but in theory it could make sure the dataframe-jupyter dependency is added only when the version of dataframe is 0.14.0+. Something like:

{
  "init": [
    "if ($v >= 0.14.0) USE { dependencies(\"...dataframe-jupyter:$v\") }"
  ]
}

(not sure if we can get the $v, but there's probably a way)

Alternatively, we could just prompt users to update their kernel versions with: https://github.com/Kotlin/kotlin-jupyter/blob/master/docs/libraries.md#minimal-kernel-version-supported-by-the-library

@ileasile What do you think is the wisest solution?

Jolanrensen commented 3 months ago

Nice! Adding something like this works and could actually solve the issue I think :)

  "init": [
    "val (major, minor, patch) = \"$v\".split('.').map { it.filter { it.isDigit() }.toInt() }",
    "if (minor >= 14) { USE { dependencies(\"org.jetbrains.kotlinx:dataframe-jupyter:$v\") }; println(\"adding jupyter!\") } else { println(\"not adding jupyter!\") }"
  ],
  "dependencies": [
    "org.jetbrains.kotlinx:dataframe:$v"
  ],

By default, the dataframe module is added as dependency, this is fine. However, we can make it so that for version X or higher, it automatically pulls the dataframe-jupyter module as well :)

@koperagen what was the friend module for btw? It seems to work fine if :dataframe-jupyter just depends on :core

AndreiKingsley commented 2 weeks ago

Important point - the Jupyter API has moved to java 11, so this new module should also use java 11

Jolanrensen commented 2 weeks ago

Important point - the Jupyter API has moved to java 11, so this new module should also use java 11

indeed, mentioned here: https://github.com/Kotlin/dataframe/issues/700

I'd also like to add that any modules providing additional jupyter logic should likely move their logic to the dataframe-jupyter module, so that they themselves can keep targeting java 8 while their jupyter logic can target java 11.

koperagen commented 1 week ago

With new API in kernel required to implement ktor client integration, we'd need to either move to java 11 or proceed with this task https://github.com/Kotlin/dataframe/issues/771

Jolanrensen commented 1 week ago

With new API in kernel required to implement ktor client integration, we'd need to either move to java 11 or proceed with this task #771

Well it was only a matter of time. Let's continue after the 0.15 release?