eth-sri / lmql

A language for constraint-guided and efficient LLM programming.
https://lmql.ai
Apache License 2.0
3.51k stars 194 forks source link

Token Healing Feature #213

Open ambroser53 opened 10 months ago

ambroser53 commented 10 months ago

Hi I'm trying to decide between utilising LMQL or guidance for a project I'm working on (I'm sure you guys get this a lot) and it seems like LMQL is far more documented, maintained and feature rich. The only feature I see guidance has that LMQL does not have is "token healing". Is this something in the works for LMQL or is there some disastrous side effect to its usage that warrants its exclusion?

lbeurerkellner commented 10 months ago

Hi there, we provide an extensive comparison to Guidance here: https://docs.lmql.ai/en/latest/python/comparison.html.

Regarding token healing, we definitely plan to add it, but we cannot give a good time window, as to when that will happen. In practice, I find token healing to be convenient as it removes some of the tokenisation gotchas around prompt construction, however, with some experience, you can typically fix these issues simply by formulating your query programs accordingly.

Nonetheless, I think token healing is an interesting idea. In general, there are no major obstacles to implement it in LMQL, except for some implications for API-based models. For instance, OpenAI API requests can only contain one logit mask per request. With token healing this implies that each continuation of your prompt leads to at least two requests: One constrained request to heal the continuation with your existing prompt and another request to continue the healed continuation. Similar issues arise for local models, when caching comes into play or you want to generally reduce the number of requests sent off to the inference process.

It would be interesting to hear your thoughts or use cases for token healing. Maybe this can also help us reprioritise its implementation. Let me know :)

andy-zhou commented 7 months ago

Hey @lbeurerkellner

Wanted to chime in here. I think token healing is really important for DX. Take the first example that you have on your documentation:

"Greet LMQL:[GREETINGS]\n" where stops_at(GREETINGS, ".") and not "\n" in GREETINGS

if "Hi there" in GREETINGS:
    "Can you reformulate your greeting in the speech of \
     victorian-era English: [VIC_GREETINGS]\n" where stops_at(VIC_GREETINGS, ".")

"Analyse what part of this response makes it typically victorian:\n"

for i in range(4):
    "-[THOUGHT]\n" where stops_at(THOUGHT, ".")

"To summarize:[SUMMARY]"

The generation of [VIC_GREETINGS] follows immediately after a space, which skews the generation:

from huggingface_hub import hf_hub_download
import lmql

model_path = hf_hub_download('TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF', 'mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf')
model = lmql.model("local:llama.cpp:"+model_path, tokenizer='mistralai/Mixtral-8x7B-Instruct-v0.1', verbose=True)
@lmql.query(model=model)
async def greetings_split_token():
  '''lmql
  "Write a greeting to LMQL:[GREETINGS]\n" where stops_at(GREETINGS, ".") and not "\n" in GREETINGS

  "Can you reformulate your greeting in the speech of \
  victorian-era English: [VIC_GREETINGS]\n" where stops_at(VIC_GREETINGS, ".")

  "Analyse what part of this response makes it typically victorian:\n"

  for i in range(4):
      "-[THOUGHT]\n" where stops_at(THOUGHT, ".")

  "To summarize:[SUMMARY]"
  '''

split_tokens = await greetings_split_token()

# split_tokens.variables['VIC_GREETINGS'] == '\n\nHail, LMQL! A pleasure it is to make your acquaintance once more.'

The front-page example on lmql.ai has trailing space issues as well:

@lmql.query
def meaning_of_life():
    '''lmql
    # top-level strings are prompts
    "Q: What is the answer to life, the \
     universe and everything?"

    # generation via (constrained) variables
    "A: [ANSWER]" where \
        len(ANSWER) < 120 and STOPS_AT(ANSWER, ".")

    # results are directly accessible
    print("LLM returned", ANSWER)

    # use typed variables for guaranteed 
    # output format
    "The answer is [NUM: int]"

    # query programs are just functions 
    return NUM
    '''

# so from Python, you can just do this
meaning_of_life() # 42

These are all toy examples, so I think the impact is limited, but I think it shows how easy it is for users to make prompt boundary mistakes.

lbeurerkellner commented 7 months ago

Yes, thank you for compiling the examples. I am very aware of the issue and I agree that token healing should be added. I have been working on some things, which will bring token healing among other things. It is more of a capacity and time issue right now, so I cannot really give an ETA on it, but it is definitely happening :)

ibehnam commented 2 months ago

@lbeurerkellner Just wanted to get an update on this after a few months. I know you guys are busy, but did you by any chance get to work on this feature? If not, it'd be nice to have a short tutorial on what to do to avoid token boundary issues using LMQL.