Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
820 stars 58 forks source link

Columns Selection DSL Overhaul (Feedback needed!) #397

Closed Jolanrensen closed 8 months ago

Jolanrensen commented 1 year ago

Discussed in https://github.com/Kotlin/dataframe/discussions/396

Originally posted by **Jolanrensen** June 9, 2023 Hi everyone! While working on the Columns Selection DSL, which is an integral part of the library, I found many inconsistencies and took it upon myself to fix that. I documented almost every function and reworked all overloads, but I still have doubts about some. So, I collected these concerns in a helpful [Kotlin Notebook](https://github.com/Kotlin/dataframe/blob/9799143664fa86138001d91496495b62347eb616/examples/notebooks/selectionDslOverhaul/selectionDslOverhaul.ipynb) for you to experience and judge! The notebook imports a proposal version of DataFrame based on the extensive [PR](https://github.com/Kotlin/dataframe/pull/372), so you can experience the changes firsthand. The data used for the example can be found [here](https://github.com/Kotlin/dataframe/blob/9799143664fa86138001d91496495b62347eb616/examples/notebooks/selectionDslOverhaul/lim.json) and to view and run the notebook, I recommend the [Kotlin Notebook Plugin for IntelliJ](https://plugins.jetbrains.com/plugin/16340-kotlin-notebook). You can also use Datalore or the Kotlin Jupyter kernel, but then you'd miss out on the new and extensive KDocs, so I wouldn't recommend that. Please leave your feedback below in the corresponding thread below. I'm curious to hear your feedback! Also, feel free to leave any questions below as well if my notebook isn't clear enough in explaining the issues.
Jolanrensen commented 1 year ago

I'll mention some of our helpful contributors below. Feel free to ignore this mention if you are not interested (anymore), but we value your opinion :).

Check out the Discussion if you want :)

@nikitinas @Kopilov @pacher @holgerbrandl @vhuc @Jimexist @Kantis @cmelchior @Adriankhl @alllex @matthewwiese @njacobs5074 @PoslavskySV

zaleslaw commented 1 year ago

My answers on the questions (from my feeling of user)

Split plain DSL calls from SingleColumn functions? Yes, please

KProperty<DataRow<>>.function() or KProperty<>.function()

I prefer KProperty<>.function() for every day routine

I'm a big fun of this approach

df.select { SomeType::created.function3() // :) SomeType::filtering.function3() // :) SomeType::name.function3() // :( }

Question: All, Cols, Children overlap and differences

I don't use this API

// all same df.select { all() } df.select { cols() } df.select { children() } // same but debatable name? df.select { filter { true } }

but love this

// all same
df.select { created.all() }
df.select { created.cols() }
df.select { created.children() }
df.select { created.filter { true } }

But it's ok to keep, the behaviour should be the same

Question: String/ColumnPath, allow group calls directly on it or only on SingleColumn?

yes, more useful functions everywhere

From the other hand, I really don't understand the concept where we call columns via String names.

If this works, what's the problem to be more Pytonic everywhere:)

Question: cols {} vs filter {}

For filtering columns, I suggest use only one operator like cols{}, filter confused me here, I'm starting to think about data filtering

Question: What to do with Except? Probably minus or overloading of - is better here. Thinking about columns like about List in Kotlin

Question: If SingleColum.except would keep structures intact, should select {} as well?

I'd like this // then what should this do? Select items and link with JUST target inside? // We could even make a take {} function for this for example

Question: What to do about Select/Cols overlap?

Only this looks not pretty for me, no big deal hierarchy of selects, but sometimes for reading, I want to see something like "innerSelect" or "nestedSelect"

section.select { "link"[{ it.name.startsWith("target") }] }

()-option and []-overloads

As I said, the [] overloading is not the best hier, from my point of view. Especially mixed with the Strings

Jolanrensen commented 1 year ago

Please leave your thoughts in the Discussion