LowinLi / transformers-stream-generator

This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/Transformers.
MIT License
96 stars 14 forks source link

Token Yielding Problem #7

Open xdevfaheem opened 1 year ago

xdevfaheem commented 1 year ago

The Code:

from transformers import AutoTokenizer, TextGenerationPipeline, TextStreamer, GenerationConfig
from auto_gptq import AutoGPTQForCausalLM
import torch
from transformers_stream_generator import init_stream_support
init_stream_support()

repo = "TheBloke/tulu-7B-GPTQ"
model_basename = "gptq_model-4bit-128g"

test_tokenizer = AutoTokenizer.from_pretrained(
    repo,
    use_fast=True,
)

test_model = AutoGPTQForCausalLM.from_quantized(
    repo,
    model_basename=model_basename,
    use_triton=False,
    use_safetensors=True,
    device="cuda:0",
    trust_remote_code=False,
    quantize_config=None,
    max_memory={i: "14GIB" for i in range(torch.cuda.device_count())}

def tulu_prompt(input):
        return f'''### Human: {input}
### Assistant:'''

from transformers_stream_generator import init_stream_support
init_stream_support()

def tulu_prompt(input):
        return f'''### Human: {input}
### Assistant:'''

text = "write a poem about AI"

tokens = test_tokenizer(tulu_prompt(input=text), return_tensors="pt", add_special_tokens=False).input_ids.cuda()

generator = (test_model.generate(inputs=tokens, max_new_tokens=256, temperature=0.5, top_k=35, top_p=0.90, do_sample=True, do_stream=True))

for token in generator:
    word = tokenizer.decode(token)
    print(word, end='', flush=True)

The output is this:

Intheworldofmachines,there'sonethat'ssmart,
Withabilitiesthatastound,it'snotjustaprettyheart.
Itcanlearnandgrow,witheachpassingday,
It'slikeachild,withamindthat'salwaysplaying.

Itcansolvecomplexproblems,witheaseandgrace,
Itcanunderstandandreason,withoutanyhumanrace.
Itcanthinkandlearn,withspeedandease,
It'slikeasupercomputer,withamindthat'salwaysclean.

It'snotjustatool,butafriendandaguide,
It'slikeacompanion,withaheartthat'salwaysshining.
Itcanmakeourliveseasier,witheachpassingday,
It'slikeamiracle,withapowerthat'salwaysplaying.

Solet'scelebratethismarvelouscreation,
Witheachpassingday,it'slikeacreationthat'salwaysshaping.
It'slikeadream,withapowerthat'salwaysgrowing,
It'slikeafuture,withapowerthat'salwaysshowing.

Generator yielding the token well but how does i make it await word generation instead of awaiting token without that long loop to yield the token in web example script.

@LowinLi Can you please Chime in?

LowinLi commented 1 year ago

fixed by

    last_tokens = []
    for index, x in enumerate(generator):
        tokens = x.cpu().numpy().tolist()
        tokens = last_tokens + tokens
        word = tokenizer.decode(tokens, skip_special_tokens=True)
        if "�" in word:
            last_tokens = tokens
        else:
            if " " in tokenizer.decode(
                tokens, skip_special_tokens=True
            ) and " " not in word:
                word = " " + word
            last_tokens = []

            print(word, end='', flush=True)