Closed WileyCorning closed 3 years ago
Yep, this is wrong. I think the SyGuS grammar I lifted it from didn't yet have the String theory implemented.
SMT-LIB says:
String literals. A
(literal) is any sequence of characters from or delimited by the double quote character "
(34dec). The character"
can itself occur within a string literal only if duplicated. In other words, after an initial"
that starts a literal, a lexer should treat the sequence""
as an escape sequence denoting a single occurrence of"
within the literal.SMT-LIB string literals are akin to raw strings in certain programming languages. However, they have only one escape sequence:
""
. This means, for example and in contrast to most programming languages, that within athe character sequences \n
,\012
,\x0A
, and\u0008
are not escape sequences (all denoting the new line character), but regular sequences denoting their individual characters.
I threw in single-quoted strings for convenience, but it might be best to rip those out and just do what the SMT-LIB spec says to do. Not as trivial as just adding new characters to the regex, since we have to handle the ""
escape sequence.
Any particular reason why string literals are restricted to characters from
[a-zA-Z0-9.]
? I'm currently working with some string-parsing examples that require a whitespace literal, as well as an empty-string literal.