Quotes aren't always parsed properly

nopepper commented 4 months ago

For example:

import lmql

@lmql.query()
def quote():
    '''lmql
    "\"[VAL]\""
    return VAL
    '''

Fails with the error:

File [f:\workspace\lmql-pydantic\.venv\Lib\site-packages\IPython\core\interactiveshell.py:3553](file:///F:/workspace/lmql-pydantic/.venv/Lib/site-packages/IPython/core/interactiveshell.py:3553) in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  Cell In[16], [line 1](vscode-notebook-cell:?execution_count=16&line=1)
    @lmql.query()

  File [f:\workspace\lmql-pydantic\.venv\Lib\site-packages\lmql\api\queries.py:108](file:///F:/workspace/lmql-pydantic/.venv/Lib/site-packages/lmql/api/queries.py:108) in wrapper
    return query(fct, input_variables=input_variables, is_async=is_async, calling_frame=calling_frame, **extra_args)

  File [f:\workspace\lmql-pydantic\.venv\Lib\site-packages\lmql\api\queries.py:130](file:///F:/workspace/lmql-pydantic/.venv/Lib/site-packages/lmql/api/queries.py:130) in query
    module = load(temp_lmql_file, output_writer=silent)

  File [f:\workspace\lmql-pydantic\.venv\Lib\site-packages\lmql\api\queries.py:22](file:///F:/workspace/lmql-pydantic/.venv/Lib/site-packages/lmql/api/queries.py:22) in load
    module = compiler.compile(filepath)

  File [f:\workspace\lmql-pydantic\.venv\Lib\site-packages\lmql\language\compiler.py:924](file:///F:/workspace/lmql-pydantic/.venv/Lib/site-packages/lmql/language/compiler.py:924) in compile
    transformations.transform(q)

  File [f:\workspace\lmql-pydantic\.venv\Lib\site-packages\lmql\language\compiler.py:789](file:///F:/workspace/lmql-pydantic/.venv/Lib/site-packages/lmql/language/compiler.py:789) in transform
    t = T(query).transform()

  File [f:\workspace\lmql-pydantic\.venv\Lib\site-packages\lmql\language\compiler.py:346](file:///F:/workspace/lmql-pydantic/.venv/Lib/site-packages/lmql/language/compiler.py:346) in transform
    self.query.prompt = [self.visit(p) for p in self.query.prompt]
...
  File <unknown>:1
    f""""[VAL]""""
                 ^
SyntaxError: unterminated string literal (detected at line 1)

There is an ugly workaround for now:

@lmql.query()
def quote():
    '''lmql
    q = "\""
    "\"[VAL]{q}"
    return VAL
    '''

lbeurerkellner commented 4 months ago

Thanks for reporting. Marking this as a good first issue.

The fix is likely somewhere close to https://github.com/eth-sri/lmql/blob/main/src/lmql/language/compiler.py#L428, where we compile LLM query strings into multi-line strings in the compiled representation of the program.

Saibo-creator commented 4 months ago

I did some investigation and here are my findings:

import lmql

@lmql.query()
def quote_at_begin():
    '''lmql
    "\"x\"=123"
    '''
# This case passes without any errors.

@lmql.query()
def quote_at_end():
    '''lmql
    "123=\"x\""
    '''
# This triggers a SyntaxError:
# SyntaxError: unterminated string literal (detected at line 1)

So only the quote_at_end causes error. I feel this may be a bug of python ast parser ? because it can parse ast.parse(f""""x"==123""") works but not ast.parse(f"""123=="x"""") ?

A temporary workaround is to check if we ends with quote and add a space after it to make the parser working and then remove the space immediately after parsing.

lbeurerkellner commented 3 months ago

I think the behavior of ast.parse is actually correct here. In Python, """"a""" is valid, whereas """a"""" is not valid. This is because after reading """ a parser's scanner will look for the next """ and then terminate the current string terminal. An extra " at the end of such a string will thus be read as an unterminated string literal.

To fix this issue, I think it should be enough to just add:

if compiled_qstring.endswith("\""):
            compiled_qstring = compiled_qstring[:-1] + "\\\""

This prevents a """" (four quotes) sequences in compiled_qstring, as the last quote in a qstring will always be escaped. This may need some more testing to check whether it covers all the cases correctly though.

eth-sri / lmql

Quotes aren't always parsed properly #324