degory / ghul

compiler for the ghūl programming language
https://ghul.dev
GNU Affero General Public License v3.0
4 stars 0 forks source link

String interpolation #1071

Closed degory closed 8 months ago

degory commented 9 months ago

Support string interpolation

TODO

Essential

Optional

Details

We're not using { and } for anything, and they're already associated with string interpolation in other languages.

We could in principle use { and } as quotes, as well as for introducing interpolated expressions, however this is not very readable and so could be error prone.

I don't want to further overload `, or require a prefix to " in order to differentiate interpolated strings: once they're available they're likely to be the most common way to format output, more so than string.format or overloads on write_line for example.

So, as the expected most common use case, we want string interpolation to be low friction with a clean syntax. The rarer non-interpolation uses of { and } in strings will instead need to be escaped. To make this less painful, when it is needed, we should support escaping { and } by doubling them - this is both easier to type and easier to read than escaping with `\'.

"this is a string {this is an expression} back to the string {another expression "nested string" back to the second expression} back to the original string"
"name is {name}, age is {age}, height is {height} number of limbs is {arms + legs + if head? then 1 else 0 fi}"

The syntax highlighting will need to be updated to support this. Although true recursive syntax highlighting is not possible from a TextMate grammar, we can do three or four levels of nesting, which will likely cover the vast majority of real use.

Some languages allow newlines in interpolated strings and some also disable escapes, but I'm not convinced the benefits outweigh the complexity. I'd prefer a separate syntax to introduce a multi-line string, to avoid incomplete strings causing a cascade of errors and breaking completion and other language extension features in the interpolated expressions.

To parse interpolations we'll want to split the complexity between the tokenizer and the parser. The tokenizer needs to keep track of whether it's in a interpolated string or in an expression within an interpolated string, which it can do just by tracking { and }. The tokenizer can then feed the parser a sequence of enter string interpolation interpolated string segment enter interpolated expression exit interpolated expression exit string interpolation. The parser can then take this linear sequence and recursively build the required parse tree from it.

The simple option for code generation is to translate interpolations into a tree of string + object expressions in the parser. However, the C# compiler uses DefaultInterpolatedStringHandler, which is designed for string interpolations and is presumably more efficient. It uses generic methods though, which we obviously can call but would be quite a bit more work.