Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
760 stars 48 forks source link

Kotlin DataFrame compiler plugin #704

Open koperagen opened 1 month ago

koperagen commented 1 month ago

Place for discussion and questions about Kotlin DataFrame compiler plugin

Idea behind it is to make such code compile, provide coding assistance in project files and later in Kotlin Notebooks - on top of already existing code generation in-between notebook cells

@DataSchema
data class WikiData(val name: String, val paradigms: List<String>)

fun main() {
    val df = dataFrameOf(
        WikiData("Kotlin", listOf("object-oriented", "functional", "imperative")),
        WikiData("Haskell", listOf("Purely functional")),
        WikiData("C", listOf("imperative")),
    )
    val df1 = df.add("size") { 
        paradigms.size // `paradigms` is generated based on WikiData class structure
    }
    // `size` property is generated based on `add` argument
    df1.size.print()
}

Implementation lives here https://github.com/Kotlin/dataframe/tree/compiler-plugin

Demo project that you can clone and run https://github.com/koperagen/df-plugin-demo

Issue that describes required compiler API and provides some information about use case https://youtrack.jetbrains.com/issue/KT-65859

Jolanrensen commented 1 week ago

We might need to do some additional research with regard to the maintainability of the implementation, mainly the cases where we have to write the same DataFrame logic in two places.

Doing operations on DataFrames with the plugin happens in two places:

I believe we should try, wherever we can, to share the logic between these two scopes. This can only be done in places where the logic is exclusively dependent on the structure, types, or names of the DataFrame. Sharing the logic will help us (and future contributors) to a) fix bugs more easily and b) keep ensuring consistency between the plugin and the library.

I see 3 options for us:

Feel free to edit this comment to add more pros and cons to each option or to add more options.

These are just my thoughts for now :) I'm curious to see what you think!