Closed MitchTurner closed 1 year ago
The issue is to decide what are valid escape sequences.
For now Aiken UPLC parser does not support any:
rule string() -> String
= "\"" s:[^ '"']* "\"" { String::from_iter(s) }
The Plutus Core Spec says in Appendix A.1:
Concrete syntax for strings. Strings are represented as sequences of Unicode characters enclosed in double quotes, and may include standard escape sequences.
However despite some escape sequences being standardized for some languages, like C, there is as far as I know no "standard escape sequences".
PlutusTx conText
seems to rely on megaparsec charLiteral
:
-- | Parser for string constants. They are wrapped in double quotes.
conText :: Parser T.Text
conText = lexeme . fmap T.pack $ char '\"' *> manyTill Lex.charLiteral (char '\"')
Which implements the Haskell Report grammar rules:
The literal character is parsed according to the grammar rules defined in the Haskell report.
I'm not sure what is supported by those exactly, it seems to be: https://book.realworldhaskell.org/read/characters-strings-and-escaping-rules.html
Which includes quite a lot of non common ones and use \xHEX
for unicode escape sequence (instead of C \uHEX
or common \u{HEX}
like in rust).
It may also make sense to have the same escape sequences supported in UPLC Aiken compiler than in Aiken language.
For now Aiken seems to support a few single character escape sequences in escape lexer, but no unicode ones:
let escape = just('\\').ignore_then(
just('\\')
.or(just('/'))
.or(just('"'))
.or(just('b').to('\x08'))
.or(just('f').to('\x0C'))
.or(just('n').to('\n'))
.or(just('r').to('\r'))
.or(just('t').to('\t')),
);
Also not sure why it supports the weird \/
one that does not require escaping.
@SmaugPool
cool that makes sense. Thanks for writing this.
The Plutus Core spec says that strings are allowed to be any Unicode string. The parser currently doesn't support that. For example, my proptest quickly found this innocuous string that broke the parser:
Specifically, the quotes in the middle mess it up.
Probably will never come up, but it's good to uphold contracts even if they are edge cases.