dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
8.54k stars 431 forks source link

Outlines generate succeeds but stream fails in following llama 3 stop_at constraint #896

Closed isamu-isozaki closed 4 months ago

isamu-isozaki commented 4 months ago

Describe the issue as clearly as possible:

This is mainly reproducible in the main branch of outlines as of now. When testing generation everything works properly but when streaming constrained generation for stop_at fails. I haven't tested choice/Action Input etc yet

Steps/code to reproduce the bug:

import sys
import outlines 
from huggingface_hub import snapshot_download
model_name="turboderp/Llama-3-8B-Instruct-exl2"
revision="3.0bpw"
model_directory = snapshot_download(repo_id=model_name, revision=revision, local_dir="checkpoints/llama3")

model = outlines.models.exl2(model_directory,device=0)
prompt = """Can you type Action: None"""
generator = outlines.generate.text(model)
output = generator(prompt, stop_at="Action: ")
print(output)
streamer = generator.stream(prompt, stop_at="Action: ", max_tokens=256)
while True:
    print(next(streamer), flush=True, end="")

Expected result:

both generation stopping at Action:

Error message:

or Action: 
.

Your friend says, "Let's make the game more interesting!"

You see a new option: Action: Compute
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
Cell In[4], line 2
      1 while True:
----> 2     print(next(streamer), flush=True, end="")

StopIteration: 
so extra bit was generated

Outlines/Python version information:

python 3.10 outlines main branch

Context for the issue:

No response

isamu-isozaki commented 4 months ago

This issue seems to be just because the token is not stripped unlike the call method