eth-sri / lmql

A language for constraint-guided and efficient LLM programming.
https://lmql.ai
Apache License 2.0
3.65k stars 197 forks source link

Spaces not being stripped in LMQL docstrings in certain scenarios #156

Open chendo opened 1 year ago

chendo commented 1 year ago

Revision: v0.7b2

It appears LMQL docstrings are not stripping trailing spaces when docstrings aren't at the root level for a particular scope.

argmax
    """This should be the start.
    There should be no spaces after this.
    """
    for i in range(2):
        """This should have no spaces in front.
        There should be no spaces after this.
        """
        for j in range(2):
            """This should also have no spaces in front.
            There should be no spaces after this.
            """

Complies to:

import lmql
import lmql.lib
@lmql.compiled_query(output_variables=[])
async def query():
   yield lmql.runtime_support.context_call("set_decoder", 'argmax', )
   yield lmql.runtime_support.context_call("set_model", lmql.model('llama.cpp:/Users/chendo/Code/ML/Models/TheBloke/WizardLM-13B-V1.2-GGML/wizardlm-13b-v1.2.ggmlv3.q6_K.bin', endpoint='localhost:8080', n_ctx=2048))
   # where
   yield lmql.runtime_support.context_call("set_where_clause", None)
   # prompt
   (yield lmql.runtime_support.interrupt_call('query', f'This should be the start.\nThere should be no spaces after this.\n', {**globals(), **locals()}))
   for i in (await 
   lmql.runtime_support.call(range, 2)):
       (yield lmql.runtime_support.interrupt_call('query', f'This should have no spaces in front.\n    There should be no spaces after this.\n    ', {**globals(), **locals()}))
       for j in (await 
       lmql.runtime_support.call(range, 2)):
           (yield lmql.runtime_support.interrupt_call('query', f'This should also have no spaces in front.\n        There should be no spaces after this.\n        ', {**globals(), **locals()}))
   yield ('result', (yield lmql.runtime_support.context_call("get_return_value", ())))

Note the trailing spaces in docstrings defined in for loops. I would expect these trailing spaces to not be there.

lbeurerkellner commented 1 year ago

Good catch, thank you. I agree, that we should strip them consistently.

lbeurerkellner commented 1 year ago

Re-labeling this as Enhancement. I found the problem with the behaviour above is that it indeed is standard python behaviour, which we inherit by being a superset. I still agree we should provide slightly different behaviour though.

One way is to always dedent with textwrap, e.g. when you do something like

if True:
           """\
           This should also have no spaces in front.
           There should be no spaces after this.
           """

we can automatically remove the indendentation. Note however, the additional NL and \ backslash at the beginning of the multi-line string. Without this, the dedentation does not work, as the first line in the original code snippet would always be counted as an indentation of 0.

Another conflict that comes up, is the use case of actually wanting that indentation, e.g. when you write query code that templates indentation-sensitive source code, e.g. for Python code generation. In that case, we also want to offer a way to explicitly insert indentation, which should not be stripped away.

I am open for ideas here, still thinking about the best way to do it.

lbeurerkellner commented 7 months ago

This may be a good first issue work on, however, it will require some language-level design thinking. The goal should really be to have something intuitive and minimally-breaking with respect to previous versions.