[x] Tokenize string interpolations including transitions between string literal fragments and interpolated expressions
[x] Tokenize formats for interpolated expressions, introduced with : and terminated with }
[x] Handle arbitrary nesting of interpolations
[x] A newline character within an interpolation should cancel out of it immediately with an error, and also completely unwind tokenization and parsing of interpolations, returning the outermost string interpolation parse tree.
[x] Allow escaping of { and } by doubling them
[x] Add a string interpolation parse tree and update all visitors to accept it
[x] Parse string interpolations to correct parse tree
[x] Generate code to concatenate string fragments and interpolated expressions into a string result
[x] Format each interpolated value according to its format string, where provided
Optional
[x] Use DefaultInterpolatedStringHandler to perform the concatenation of string fragments and interpolated expressions
Details
We're not using { and } for anything, and they're already associated with string interpolation in other languages.
We could in principle use { and } as quotes, as well as for introducing interpolated expressions, however this is not very readable and so could be error prone.
I don't want to further overload `, or require a prefix to " in order to differentiate interpolated strings: once they're available they're likely to be the most common way to format output, more so than string.format or overloads on write_line for example.
So, as the expected most common use case, we want string interpolation to be low friction with a clean syntax. The rarer non-interpolation uses of { and } in strings will instead need to be escaped. To make this less painful, when it is needed, we should support escaping { and } by doubling them - this is both easier to type and easier to read than escaping with `\'.
"this is a string {this is an expression} back to the string {another expression "nested string" back to the second expression} back to the original string"
"name is {name}, age is {age}, height is {height} number of limbs is {arms + legs + if head? then 1 else 0 fi}"
The syntax highlighting will need to be updated to support this. Although true recursive syntax highlighting is not possible from a TextMate grammar, we can do three or four levels of nesting, which will likely cover the vast majority of real use.
Some languages allow newlines in interpolated strings and some also disable escapes, but I'm not convinced the benefits outweigh the complexity. I'd prefer a separate syntax to introduce a multi-line string, to avoid incomplete strings causing a cascade of errors and breaking completion and other language extension features in the interpolated expressions.
To parse interpolations we'll want to split the complexity between the tokenizer and the parser. The tokenizer needs to keep track of whether it's in a interpolated string or in an expression within an interpolated string, which it can do just by tracking { and }. The tokenizer can then feed the parser a sequence of enter string interpolationinterpolated string segmententer interpolated expressionexit interpolated expressionexit string interpolation. The parser can then take this linear sequence and recursively build the required parse tree from it.
The simple option for code generation is to translate interpolations into a tree of string+object expressions in the parser. However, the C# compiler uses DefaultInterpolatedStringHandler, which is designed for string interpolations and is presumably more efficient. It uses generic methods though, which we obviously can call but would be quite a bit more work.
Support string interpolation
TODO
Essential
:
and terminated with}
{
and}
by doubling themOptional
DefaultInterpolatedStringHandler
to perform the concatenation of string fragments and interpolated expressionsDetails
We're not using
{
and}
for anything, and they're already associated with string interpolation in other languages.We could in principle use
{
and}
as quotes, as well as for introducing interpolated expressions, however this is not very readable and so could be error prone.I don't want to further overload
`
, or require a prefix to"
in order to differentiate interpolated strings: once they're available they're likely to be the most common way to format output, more so thanstring.format
or overloads onwrite_line
for example.So, as the expected most common use case, we want string interpolation to be low friction with a clean syntax. The rarer non-interpolation uses of
{
and}
in strings will instead need to be escaped. To make this less painful, when it is needed, we should support escaping{
and}
by doubling them - this is both easier to type and easier to read than escaping with `\'.The syntax highlighting will need to be updated to support this. Although true recursive syntax highlighting is not possible from a TextMate grammar, we can do three or four levels of nesting, which will likely cover the vast majority of real use.
Some languages allow newlines in interpolated strings and some also disable escapes, but I'm not convinced the benefits outweigh the complexity. I'd prefer a separate syntax to introduce a multi-line string, to avoid incomplete strings causing a cascade of errors and breaking completion and other language extension features in the interpolated expressions.
To parse interpolations we'll want to split the complexity between the tokenizer and the parser. The tokenizer needs to keep track of whether it's in a interpolated string or in an expression within an interpolated string, which it can do just by tracking
{
and}
. The tokenizer can then feed the parser a sequence ofenter string interpolation
interpolated string segment
enter interpolated expression
exit interpolated expression
exit string interpolation
. The parser can then take this linear sequence and recursively build the required parse tree from it.The simple option for code generation is to translate interpolations into a tree of
string
+
object
expressions in the parser. However, the C# compiler usesDefaultInterpolatedStringHandler
, which is designed for string interpolations and is presumably more efficient. It uses generic methods though, which we obviously can call but would be quite a bit more work.