dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.72k stars 497 forks source link

Does `outlines==0.1.0` remove streaming for `Transformers` models? #1201

Open ambroser53 opened 1 month ago

ambroser53 commented 1 month ago

Describe the issue as clearly as possible:

I'm a heavy user of outlines.models.Transformers and use the stream function after converting to a regex generator via outlines.generate.regex, however, when testing 0.1.0 for the speed improvements I notice that now when I run .stream it simply runs through the output all in one go and then outputs the streamer instantaneously.

Steps/code to reproduce the bug:

import time

import outlines
import torch
from outlines.samplers import MultinomialSampler
from transformers import AutoModelForCausalLM, AutoTokenizer

pretrained_ckpt = "meta-llama/Llama-3.2-3B"
model = AutoModelForCausalLM.from_pretrained(pretrained_ckpt, torch_dtype=torch.bfloat16,
                                                           trust_remote_code=True,
                                                           attn_implementation="flash_attention_2",
                                                           device_map="cuda:0")
tokenizer = AutoTokenizer.from_pretrained(pretrained_ckpt)
outlines_model = outlines.models.Transformers(model=model, tokenizer=tokenizer)
generator = outlines.generate.regex(outlines_model, "(accepted|rejected) the invitation by saying: \"[A-Za-z,.!?\- ']+\"")
streamer = generator.stream("Micheal ", max_tokens=100)

resp = ""
time_start = time.time()
for token in streamer:
    resp += token
    print(token)
    print(time.time() - time_start)

print(resp)

Expected result:

On outlines==0.0.46 we get the following output with tokens being returned as they are being outputted.

accepted
0.3291473388671875
 the
0.3473653793334961

0.3576169013977051
invitation
0.3677070140838623
 by
0.3798065185546875
 saying
0.3898754119873047
:
0.4002041816711426
 "
0.41028761863708496
Perhaps
0.4254317283630371
 it
0.4405210018157959
 was
0.45534610748291016
 D
0.4698038101196289
olf
0.4842183589935303
 who
0.4987914562225342
 happened
0.5133378505706787
 to
0.5278193950653076
 get
0.542212724685669
 over
0.5566153526306152
 there
0.5710551738739014
,
0.5878012180328369
 so
0.6036765575408936
 it
0.6190624237060547
 may
0.6345548629760742
 be
0.6498222351074219
 coming
0.6652572154998779
 from
0.6807975769042969
 K
0.6953606605529785
la
0.709963321685791
as
0.7246041297912598
."
0.7391231060028076
accepted the invitation by saying: "Perhaps it was Dolf who happened to get over there, so it may be coming from Klaas."

Error message:

with outlines version 0.1.0 we get the following output with all the data essentially coming out at the same time.

a
0.8557868003845215
cc
0.8558142185211182
ept
0.8558330535888672
ed
0.855849027633667
 the
0.8558666706085205
 invitation
0.8558826446533203
 by
0.8558969497680664
 saying
0.855921745300293
:
0.8559391498565674
 "
0.8559558391571045
I
0.8559701442718506
'll
0.8559842109680176
 bring
0.8559982776641846
 some
0.8560121059417725
 of
0.8560261726379395
 is
0.856043815612793
 unique
0.8560581207275391
 music
0.856072187423706
 style
0.856086015701294
 to
0.8560996055603027
 the
0.8561127185821533
 festival
0.8561265468597412
 and
0.8561403751373291
 will
0.8561539649963379
 be
0.8561675548553467
 spending
0.8561809062957764
 some
0.8561944961547852
 quality
0.8562085628509521
 time
0.856226921081543
 with
0.8562402725219727
 the
0.8562536239624023
 R
0.8562667369842529
aja
0.8562805652618408
 Indian
0.8562946319580078
 Office
0.8563082218170166
 members
0.8563218116760254
 and
0.856334924697876
 introduce
0.8563485145568848
 him
0.8563623428344727
 to
0.8563754558563232
 the
0.8563880920410156
 new
0.8564021587371826
 culture
0.8564155101776123
 and
0.8564286231994629
 music
0.8564414978027344
 sounds
0.8564550876617432
 of
0.8564684391021729
 the
0.8564815521240234
 Punjab
0.856494665145874
."
0.8565073013305664

0.8565213680267334
accepted the invitation by saying: "I'll bring some of is unique music style to the festival and will be spending some quality time with the Raja Indian Office members and introduce him to the new culture and music sounds of the Punjab."


### Outlines/Python version information:

Outlines versions 0.1.0 or 0.0.46
Python version 3.11.9

### Context for the issue:

I understand that this is a stand in which waiting for https://github.com/huggingface/transformers/issues/30810 but as there does not seem to be a due time for a resolution on that side is there a way to work around it. Is there a reason it's waiting for that issue to be resolved? It's not particularly clear.