chharvey / counterpoint

A robust programming language.
GNU Affero General Public License v3.0
2 stars 0 forks source link

Rework Tokenizing & Cooking: Template Literals #32

Closed chharvey closed 4 years ago

chharvey commented 4 years ago

This issue supersedes #6 and #8.

Rework template literal tokenizing and cooking with new delimiters. Character escapes are also removed.

Lexical grammar:

TemplateFull   ::= "'''" TemplateChars__EndDelim ? "'''"
TemplateHead   ::= "'''" TemplateChars__EndInterp? "{{"
TempalteMiddle ::= "}}"  TemplateChars__EndInterp? "{{"
TempalteTail   ::= "}}"  TemplateChars__EndDelim ? "'''"

TemplateChars__EndDelim ::=
    [^'{#x03] TemplateChars__EndDelim?   |
    TemplateChars__EndDelim__StartDelim  |
    TemplateChars__EndDelim__StartInterp

TemplateChars__EndDelim__StartDelim ::=
    "'"    [^'{#x03] TemplateChars__EndDelim?                                         |
    "''"   [^'{#x03] TemplateChars__EndDelim?                                         |
    "'{"  ([^'{#x03] TemplateChars__EndDelim? | TemplateChars__EndDelim__StartDelim)? |
    "''{" ([^'{#x03] TemplateChars__EndDelim? | TemplateChars__EndDelim__StartDelim)?

TemplateChars__EndDelim__StartInterp ::=
    "{"   ([^'{#x03] TemplateChars__EndDelim?                                       )? |
    "{'"  ([^'{#x03] TemplateChars__EndDelim? | TemplateChars__EndDelim__StartInterp)  |
    "{''" ([^'{#x03] TemplateChars__EndDelim? | TemplateChars__EndDelim__StartInterp)

TemplateChars__EndInterp ::=
    [^'{#x03] TemplateChars__EndInterp?   |
    TemplateChars__EndInterp__StartDelim  |
    TemplateChars__EndInterp__StartInterp

TemplateChars__EndInterp__StartDelim ::=
    "'"   ([^'{#x03] TemplateChars__EndInterp?                                       )? |
    "''"  ([^'{#x03] TemplateChars__EndInterp?                                       )? |
    "'{"  ([^'{#x03] TemplateChars__EndInterp? | TemplateChars__EndInterp__StartDelim)  |
    "''{" ([^'{#x03] TemplateChars__EndInterp? | TemplateChars__EndInterp__StartDelim)

TemplateChars__EndInterp__StartInterp ::=
    "{"    [^'{#x03] TemplateChars__EndInterp?                                           |
    "{'"  ([^'{#x03] TemplateChars__EndInterp? | TemplateChars__EndInterp__StartInterp)? |
    "{''" ([^'{#x03] TemplateChars__EndInterp? | TemplateChars__EndInterp__StartInterp)?

Attribute grammar:

TV(TemplateFull ::= "'''" "'''")
    := []
TV(TemplateFull ::= "'''" TemplateChars__EndDelim "'''")
    := TV(TemplateChars__EndDelim)

TV(TemplateHead ::= "'''" "{{")
    := []
TV(TemplateHead ::= "'''" TemplateChars__EndInterp "{{")
    := TV(TemplateChars__EndInterp)

TV(TemplateMiddle ::= "}}" "{{")
    := []
TV(TemplateMiddle ::= "}}" TemplateChars__EndInterp "{{")
    := TV(TemplateChars__EndInterp)

TV(TemplateTail ::= "}}" "'''")
    := []
TV(TemplateTail ::= "}}" TemplateChars__EndDelim "'''")
    := TV(TemplateChars__EndDelim)

TV(TemplateChars__EndDelim ::= [^'{#x03])
    := UTF16Encoding(code point of that character)
TV(TemplateChars__EndDelim ::= [^'{#x03] TemplateChars__EndDelim)
    := UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndDelim)
TV(TemplateChars__EndDelim ::= TemplateChars__EndDelim__StartDelim)
    := TV(TemplateChars__EndDelim__StartDelim)
TV(TemplateChars__EndDelim ::= TemplateChars__EndDelim__StartInterp)
    := TV(TemplateChars__EndDelim__StartInterp)

TV(TemplateChars__EndDelim__StartDelim ::= "'" [^'{#x03])
    := \x27 followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndDelim__StartDelim ::= "'" [^'{#x03] TemplateChars__EndDelim)
    := \x27 followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndDelim)
TV(TemplateChars__EndDelim__StartDelim ::= "''" [^'{#x03])
    := [\x27, \x27] followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndDelim__StartDelim ::= "''" [^'{#x03] TemplateChars__EndDelim)
    := [\x27, \x27] followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndInterp)
TV(TemplateChars__EndDelim__StartDelim ::= "'{")
    := [\x27, \x7b]
TV(TemplateChars__EndDelim__StartDelim ::= "'{" [^'{#x03])
    := [\x27, \x7b] followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndDelim__StartDelim ::= "'{" [^'{#x03] TemplateChars__EndDelim)
    := [\x27, \x7b] followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndDelim)
TV(TemplateChars__EndDelim__StartDelim ::= "'{" TemplateChars__EndDelim__StartDelim)
    := [\x27, \x7b] followed by TV(TemplateChars__EndDelim__StartDelim)
TV(TemplateChars__EndDelim__StartDelim ::= "''{")
    := [\x27, \x27, \x7b]
TV(TemplateChars__EndDelim__StartDelim ::= "''{" [^'{#x03])
    := [\x27, \x27, \x7b] followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndDelim__StartDelim ::= "''{" [^'{#x03] TemplateChars__EndDelim)
    := [\x27, \x27, \x7b] followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndDelim)
TV(TemplateChars__EndDelim__StartDelim ::= "''{" TemplateChars__EndDelim__StartDelim)
    := [\x27, \x27, \x7b] followed by TV(TemplateChars__EndDelim__StartDelim)

TV(TemplateChars__EndDelim__StartInterp ::= "{")
    := \x7b /* U+007B LEFT CURLY BRACKET */
TV(TemplateChars__EndDelim__StartInterp ::= "{" [^'{#x03])
    := \x7b followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndDelim__StartInterp ::= "{" [^'{#x03] TemplateChars__EndDelim)
    := \x7b followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndDelim)
TV(TemplateChars__EndDelim__StartInterp ::= "{'" [^'{#x03])
    := [\x7b, \x27] followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndDelim__StartInterp ::= "{'" [^'{#x03] TemplateChars__EndDelim)
    := [\x7b, \x27] followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndDelim)
TV(TemplateChars__EndDelim__StartInterp ::= "{'" TemplateChars__EndDelim__StartInterp)
    := [\x7b, \x27] followed by TV(TemplateChars__EndDelim__StartInterp)
TV(TemplateChars__EndDelim__StartInterp ::= "{''" [^'{#x03])
    := [\x7b, \x27, \x27] followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndDelim__StartInterp ::= "{''" [^'{#x03] TemplateChars__EndDelim)
    := [\x7b, \x27, \x27] followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndDelim)
TV(TemplateChars__EndDelim__StartInterp ::= "{''" TemplateChars__EndDelim__StartInterp)
    := [\x7b, \x27, \x27] followed by TV(TemplateChars__EndDelim__StartInterp)

TV(TemplateChars__EndInterp ::= [^'{#x03])
    := UTF16Encoding(code point of that character)
TV(TemplateChars__EndInterp ::= [^'{#x03] TemplateChars__EndInterp)
    := UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndInterp)
TV(TemplateChars__EndInterp ::= TemplateChars__EndInterp__StartDelim)
    := TV(TemplateChars__EndInterp__StartDelim)
TV(TemplateChars__EndInterp ::= TemplateChars__EndInterp__StartInterp)
    := TV(TemplateChars__EndInterp__StartInterp)

TV(TemplateChars__EndInterp__StartDelim ::= "'")
    := \x27 /* U+0027 APOSTROPHE */
TV(TemplateChars__EndInterp__StartDelim ::= "'" [^'{#x03])
    := \x27 followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndInterp__StartDelim ::= "'" [^'{#x03] TemplateChars__EndInterp)
    := \x27 followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndInterp)
TV(TemplateChars__EndInterp__StartDelim ::= "''")
    := [\x27, \x27]
TV(TemplateChars__EndInterp__StartDelim ::= "''" [^'{#x03])
    := [\x27, \x27] followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndInterp__StartDelim ::= "''" [^'{#x03] TemplateChars__EndInterp)
    := [\x27, \x27] followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndInterp)
TV(TemplateChars__EndInterp__StartDelim ::= "'{" [^'{#x03])
    := [\x27, \x7b] followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndInterp__StartDelim ::= "'{" [^'{#x03] TemplateChars__EndInterp)
    := [\x27, \x7b] followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndInterp)
TV(TemplateChars__EndInterp__StartDelim ::= "'{" TemplateChars__EndInterp__StartDelim)
    := [\x27, \x7b] followed by TV(TemplateChars__EndInterp__StartDelim)
TV(TemplateChars__EndInterp__StartDelim ::= "''{" [^'{#x03])
    := [\x27, \x27, \x7b] followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndInterp__StartDelim ::= "''{" [^'{#x03] TemplateChars__EndInterp)
    := [\x27, \x27, \x7b] followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndInterp)
TV(TemplateChars__EndInterp__StartDelim ::= "''{" TemplateChars__EndInterp__StartDelim)
    := [\x27, \x27, \x7b] followed by TV(TemplateChars__EndInterp__StartDelim)

TV(TemplateChars__EndInterp__StartInterp ::= "{" [^'{#x03])
    := \x7b followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndInterp__StartInterp ::= "{" [^'{#x03] TemplateChars__EndInterp)
    := \x7b followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndInterp)
TV(TemplateChars__EndInterp__StartInterp ::= "{'")
    := [\x7b, \x27]
TV(TemplateChars__EndInterp__StartInterp ::= "{'" [^'{#x03])
    := [\x7b, \x27] followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndInterp__StartInterp ::= "{'" [^'{#x03] TemplateChars__EndInterp)
    := [\x7b, \x27] followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndInterp)
TV(TemplateChars__EndInterp__StartInterp ::= "{'" TemplateChars__EndInterp__StartInterp)
    := [\x7b, \x27] followed by TV(TemplateChars__EndInterp__StartInterp)
TV(TemplateChars__EndInterp__StartInterp ::= "{''")
    := [\x7b, \x27, \x27]
TV(TemplateChars__EndInterp__StartInterp ::= "{''" [^'{#x03])
    := [\x7b, \x27, \x27] followed by UTF16Encoding(code point of that character)
TV(TemplateChars__EndInterp__StartInterp ::= "{''" [^'{#x03] TemplateChars__EndInterp)
    := [\x7b, \x27, \x27] followed by UTF16Encoding(code point of that character) followed by TV(TemplateChars__EndInterp)
TV(TemplateChars__EndInterp__StartInterp ::= "{''" TemplateChars__EndInterp__StartInterp)
    := [\x7b, \x27, \x27] followed by TV(TemplateChars__EndInterp__StartInterp)

Template literals are delimited with triple-apostrophes ''' ''' (U+0027). There are no escapable characters; the backslash is rendered verbatim. Interpolated expressions work the same, and are enclosed in double-braces {{ }}.

To escape characters, we can use interpolation with string literals, where the escape sequence is in the string.

'''This {{ '\u{24}' }} is a dollar sign.'''

'This $ is a dollar sign.'

'''Escaping three apostrophes {{ '\'\'\'' }} without ending the template.'''

'Escaping three apostrophes ''' without ending the template.'

'''Escaping two open curly braces {{ '{{' }} without ending the template.'''

'Escaping three apostrophes {{ without ending the template.'