special tokens/templates best practices

palatos commented 10 months ago

When using LMQL with local models that require a specific prompt template, where should the template be passed onto LMQL? I figure it's on the query itself, but this leads to issues when the prompt has special tokens using square brackets, as it interferes with the LMQL syntax. For instance, llama2 models use the [INST] and [/INST] tokens. I've tried adding them to the query with double square brackets but I wanted to make sure this is the appropriate method.

ricardosantos79 commented 10 months ago

hi, i started using lmql recently(been toying with it for a day or so) my tests took me to this, may it is not the right way but perhaps it will help.. this more or less the structure i found to almost work (i say almost because it usually starts speaking gibberish after a few query's) did not find a consistent way to provide conversation structure yet

ricardosantos79 commented 10 months ago

after reading [1] and [2] if you follow a ReAct scheme the system tokens could be used on {prompt_start} and your query message goes to {question}.

as far as im aware LMQL translates the tokens automatically for the underlying model, so you just need to use the LMQL builtin decorators {:system} {:user} {:assistant} which are translated to: ((system)) ((user)) ((assistant)) respectively, see [3].

this should be a step in the right direction, but a better reply from someone more knowledgeable would be welcomed.

edit: i fount in this example [4] the ReAct scheme is implemented with the tokens, so this should provide as a good base line example, hope this is helpfull.

https://github.com/eth-sri/lmql/blob/main/src/lmql/lib/actions.py#L317C6-L317C6

4mbroise commented 6 months ago

So sad that the framework doesn't handle special tokens (with tags). From what I understand, without them we get sub-optimal results. From what I understand, without them we get sub-optimal results as the models are trained to respond to these formats

lbeurerkellner commented 6 months ago

Unfortunately, current chat templates are very different between models, which makes it hard to support them all under a unified abstraction. However, what is always possible is to simply include the chat template in your query code, e.g. [INST] for Llama models.

eth-sri / lmql

special tokens/templates best practices #271