This is mainly reproducible in the main branch of outlines as of now. When testing generation everything works properly but when streaming constrained generation for stop_at fails. I haven't tested choice/Action Input etc yet
Steps/code to reproduce the bug:
import sys
import outlines
from huggingface_hub import snapshot_download
model_name="turboderp/Llama-3-8B-Instruct-exl2"
revision="3.0bpw"
model_directory = snapshot_download(repo_id=model_name, revision=revision, local_dir="checkpoints/llama3")
model = outlines.models.exl2(model_directory,device=0)
prompt = """Can you type Action: None"""
generator = outlines.generate.text(model)
output = generator(prompt, stop_at="Action: ")
print(output)
streamer = generator.stream(prompt, stop_at="Action: ", max_tokens=256)
while True:
print(next(streamer), flush=True, end="")
Expected result:
both generation stopping at Action:
Error message:
or Action:
.
Your friend says, "Let's make the game more interesting!"
You see a new option: Action: Compute
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
Cell In[4], line 2
1 while True:
----> 2 print(next(streamer), flush=True, end="")
StopIteration:
so extra bit was generated
Describe the issue as clearly as possible:
This is mainly reproducible in the main branch of outlines as of now. When testing generation everything works properly but when streaming constrained generation for stop_at fails. I haven't tested choice/Action Input etc yet
Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
python 3.10 outlines main branch
Context for the issue:
No response