Kotlin / dataframe

Structured data processing in Kotlin
https://kotlin.github.io/dataframe/overview.html
Apache License 2.0
846 stars 63 forks source link

Revisit accessor name generation #922

Open Jolanrensen opened 1 month ago

Jolanrensen commented 1 month ago

Brought to attention by https://github.com/Kotlin/dataframe/issues/911

Column names can contain any symbol. This is important to support reading and writing any format. Accessors, however, don't support all symbols due to limitations of the JVM.

Identifiers need to follow the spec:

Source: https://kotlinlang.org/spec/syntax-and-grammar.html#identifiers

To support QuotedSymbol characters, our generator automatically inserts backticks where needed. For disallowed characters, we use the following conversion:

image

This conversion makes it so that columns from data will be accessible like:

These conversions are defined to cause as little clashes as possible, but there are some confusing choices. For instance, "." becoming " ", instead of "_".

This needs some research and feedback.