curv3d / curv

a language for making art using mathematics
Apache License 2.0
1.14k stars 73 forks source link

Extended String Literals #63

Closed doug-moen closed 5 years ago

doug-moen commented 5 years ago

To help debug the GPU compiler, I will extend the curv tool to read and write the *.gpu file format, which represents the output of the GPU compiler. It's a hierarchical JSON-like data structure. However, rather than use JSON to represent the data, I'm going to use Curv.

Why use Curv to represent JSON-like data? For one thing, I don't have a JSON parser in the source code right now, but I do have a Curv parser. The bigger issue is that a .gpu file contains a large block of GLSL source code. This would be unreadable in JSON, because newlines must be escaped as \n within string literals, and that makes a multi-line GLSL program unreadable. I need the .gpu file to be human readable and human editable. Curv syntax will be easier to read and edit.

I plan to use Curv as a data interchange format more in the future, to help with various features, but this is where it starts.

To make Curv syntax more usable for these purposes, I plan to extend the syntax of string literals. There are two subfeatures: multi-line string literals, and compact escape sequences for escaping $ and " characters.

Multi-Line String Literals

It should be possible to indent a multi-line string literal without adding the indentation to the string content.

My solution: A non-initial line of a multi-line string literal begins with optional whitespace, followed by the '|' character, and this prefix is ignored. Note that '"' and '|' are both a single character, so they line up. Example:

my_string =
    "first line
    |second line
    |final line
    ";

Compact Escape Sequences for $ and "

The escape sequences for $ and " should be compact, and should not grow exponentially if you escape the escape sequence, and escape it again.

C-like languages use \\ and \" to escape the \ and " characters in a string literal. If you repeatedly escape these escape sequences, then you get exponential growth:

\ -> \\ -> \\\\ -> \\\\\\\\
" -> \" -> \\\" -> \\\\\\\"

This problem could occur if we take Curv source code and convert it to a string literal.

To avoid the exponential growth problem associated with repeated escaping, I'll introduce the following new escape sequences:

With repeated escaping, these escape sequences grow linearly, instead of exponentially:

$ -> $. -> $.. -> $...
" -> $= -> $.= -> $..=
doug-moen commented 5 years ago

documented and done