Open willkurt opened 3 weeks ago
I had the same issue on a different application, but I figured it was mostly inexperience. I believe I ended up recreating the generator each time, which is a temporary workaround for people who stumble on the issue.
Note that this will be slow and (I think) requires rebuilding the FSM each time.
The SequenceGeneratorAdapter
should be creating a new logits processor each run, but it isn't.
Should be an easy fix.
Describe the issue as clearly as possible:
When using
outlines.models.llama_cpp
and making repeated calls to an instances ofoutlines.generate.choice
only the first call returns a results. This can be resolved by re-instantiating the generate for every call, but this is not an ideal solution.The model I use in the example code is directly from the Cookbook CoT example, but this issue arose with multiple different models I had attempted earlier.
The example code will produce the following output when I run it:
I am running this on a Mac M2 and an M3 Macbook
Steps/code to reproduce the bug:
Expected result:
Error message:
No response
Outlines/Python version information:
Version information
Context for the issue:
This issue arose while putting together an Outlines workshop for ODSC. I had originally hoped to use llama_cpp for the workshop but this (and another soon to be posted bug) were blockers (I ended up using transformers instead).