Column names can contain any symbol. This is important to support reading and writing any format.
Accessors, however, don't support all symbols due to limitations of the JVM.
Identifiers need to follow the spec:
(Letter | '_') {Letter | '_' | UnicodeDigit} is allowed without `
Letter: any unicode character of categories Lu, Ll, Lt, Lm or Lo
UnicodeDigit: any unicode character of category Nd
'`' QuotedSymbol {QuotedSymbol} '`'
any character excluding CR, LF and '`' (well except the last part, we cannot write ` inside a name with backticks
To support QuotedSymbol characters, our generator automatically inserts backticks where needed.
For disallowed characters, we use the following conversion:
This conversion makes it so that columns from data will be accessible like:
"my::colName" -> df.`my - colName`
"Dwayne `The Rock` Johnson" -> df.`Dwayne 'The Rock' Johnson`
"name.first" -> df.`name first`
These conversions are defined to cause as little clashes as possible, but there are some confusing choices.
For instance, "." becoming " ", instead of "_".
Brought to attention by https://github.com/Kotlin/dataframe/issues/911
Column names can contain any symbol. This is important to support reading and writing any format. Accessors, however, don't support all symbols due to limitations of the JVM.
Identifiers need to follow the spec:
(Letter | '_') {Letter | '_' | UnicodeDigit}
is allowed without`
'`' QuotedSymbol {QuotedSymbol} '`'
'`'
(well except the last part, we cannot write`
inside a name with backticks., ;, [, ], /, <, >, :, \\
are never allowedSource: https://kotlinlang.org/spec/syntax-and-grammar.html#identifiers
To support
QuotedSymbol
characters, our generator automatically inserts backticks where needed. For disallowed characters, we use the following conversion:This conversion makes it so that columns from data will be accessible like:
df.`my - colName`
df.`Dwayne 'The Rock' Johnson`
df.`name first`
These conversions are defined to cause as little clashes as possible, but there are some confusing choices. For instance, "." becoming " ", instead of "_".
This needs some research and feedback.