Open nopepper opened 4 months ago
Thanks for reporting. Marking this as a good first issue.
The fix is likely somewhere close to https://github.com/eth-sri/lmql/blob/main/src/lmql/language/compiler.py#L428, where we compile LLM query strings into multi-line strings in the compiled representation of the program.
I did some investigation and here are my findings:
import lmql
@lmql.query()
def quote_at_begin():
'''lmql
"\"x\"=123"
'''
# This case passes without any errors.
@lmql.query()
def quote_at_end():
'''lmql
"123=\"x\""
'''
# This triggers a SyntaxError:
# SyntaxError: unterminated string literal (detected at line 1)
So only the quote_at_end
causes error.
I feel this may be a bug of python ast parser ?
because it can parse ast.parse(f""""x"==123""")
works but not ast.parse(f"""123=="x"""")
?
A temporary workaround is to check if we ends with quote and add a space after it to make the parser working and then remove the space immediately after parsing.
I think the behavior of ast.parse
is actually correct here. In Python, """"a"""
is valid, whereas """a""""
is not valid. This is because after reading """
a parser's scanner will look for the next """
and then terminate the current string terminal. An extra "
at the end of such a string will thus be read as an unterminated string literal.
To fix this issue, I think it should be enough to just add:
if compiled_qstring.endswith("\""):
compiled_qstring = compiled_qstring[:-1] + "\\\""
This prevents a """"
(four quotes) sequences in compiled_qstring
, as the last quote in a qstring will always be escaped. This may need some more testing to check whether it covers all the cases correctly though.
For example:
Fails with the error:
There is an ugly workaround for now: