SemGuS-git / Semgus-Parser

Library and tool for parsing SemGuS specifications
MIT License
4 stars 1 forks source link

Limitations on characters inside string literals #6

Closed WileyCorning closed 3 years ago

WileyCorning commented 3 years ago

Any particular reason why string literals are restricted to characters from [a-zA-Z0-9.]? I'm currently working with some string-parsing examples that require a whitespace literal, as well as an empty-string literal.

kjcjohnson commented 3 years ago

Yep, this is wrong. I think the SyGuS grammar I lifted it from didn't yet have the String theory implemented.

SMT-LIB says:

String literals. A (literal) is any sequence of characters from or delimited by the double quote character " (34dec). The character " can itself occur within a string literal only if duplicated. In other words, after an initial " that starts a literal, a lexer should treat the sequence "" as an escape sequence denoting a single occurrence of " within the literal.

SMT-LIB string literals are akin to raw strings in certain programming languages. However, they have only one escape sequence: "". This means, for example and in contrast to most programming languages, that within a the character sequences \n, \012, \x0A, and \u0008 are not escape sequences (all denoting the new line character), but regular sequences denoting their individual characters.

Page 22 and 23

I threw in single-quoted strings for convenience, but it might be best to rip those out and just do what the SMT-LIB spec says to do. Not as trivial as just adding new characters to the regex, since we have to handle the "" escape sequence.