LowinLi / transformers-stream-generator

This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/Transformers.
MIT License
96 stars 14 forks source link

spacing problem. #2

Open circuluspibo opened 1 year ago

circuluspibo commented 1 year ago

Thanks to your nice works! but I met some problem as spacing each token.

for example...

... 'on' 'st' 'amps' 'and' 'sh' 'ipping' '.' ... stamps is one word and shipping is one word. I can't distinguish the spacing between the words(tokens) How can I solve that?

Daryl149 commented 1 year ago

Instead of awaiting token completion, awaiting word completion would solve this. Not an actual stream anymore, but still useful.

xdevfaheem commented 1 year ago

Instead of awaiting token completion, awaiting word completion would solve this. Not an actual stream anymore, but still useful.

Can You Please give us a Example script

xdevfaheem commented 1 year ago
from transformers import AutoTokenizer, TextGenerationPipeline, TextStreamer, GenerationConfig
from auto_gptq import AutoGPTQForCausalLM
import torch
from transformers_stream_generator import init_stream_support
init_stream_support()

repo = "TheBloke/tulu-7B-GPTQ"
model_basename = "gptq_model-4bit-128g"

test_tokenizer = AutoTokenizer.from_pretrained(
    repo,
    use_fast=True,
)

test_model = AutoGPTQForCausalLM.from_quantized(
    repo,
    model_basename=model_basename,
    use_triton=False,
    use_safetensors=True,
    device="cuda:0",
    trust_remote_code=False,
    quantize_config=None,
    max_memory={i: "14GIB" for i in range(torch.cuda.device_count())}

def tulu_prompt(input):
        return f'''### Human: {input}
### Assistant:'''

from transformers_stream_generator import init_stream_support
init_stream_support()

def tulu_prompt(input):
        return f'''### Human: {input}
### Assistant:'''

text = "write a poem about AI"

tokens = test_tokenizer(tulu_prompt(input=text), return_tensors="pt", add_special_tokens=False).input_ids.cuda()

generator = (test_model.generate(inputs=tokens, max_new_tokens=256, temperature=0.5, top_k=35, top_p=0.90, do_sample=True, do_stream=True))

for token in generator:
    word = tokenizer.decode(token)
    print(word, end='', flush=True)

The output is this:

Intheworldofmachines,there'sonethat'ssmart,
Withabilitiesthatastound,it'snotjustaprettyheart.
Itcanlearnandgrow,witheachpassingday,
It'slikeachild,withamindthat'salwaysplaying.

Itcansolvecomplexproblems,witheaseandgrace,
Itcanunderstandandreason,withoutanyhumanrace.
Itcanthinkandlearn,withspeedandease,
It'slikeasupercomputer,withamindthat'salwaysclean.

It'snotjustatool,butafriendandaguide,
It'slikeacompanion,withaheartthat'salwaysshining.
Itcanmakeourliveseasier,witheachpassingday,
It'slikeamiracle,withapowerthat'salwaysplaying.

Solet'scelebratethismarvelouscreation,
Witheachpassingday,it'slikeacreationthat'salwaysshaping.
It'slikeadream,withapowerthat'salwaysgrowing,
It'slikeafuture,withapowerthat'salwaysshowing.

So How can i format it correctly?

@LowinLi Can you please Chim in?

sujitvasanth commented 7 months ago

I had the same problem, but @LowinLi has put a solution in his examples. He uses the tokenizer to see several tokens at a time and detects the spaces that way.

I have given a working example for you that has everything formatted correctly - you just need to substitute for your model_name_or_path in your case probably with "TheBloke/tulu-7B-GPTQ"

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from transformers_stream_generator import init_stream_support
init_stream_support()
model_name_or_path = "/home/sujit/Downloads/text-generation-webui-main/models/TheBloke_openchat-3.5-0106-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
prompt = "User: Tell me about AI<|end_of_turn|>\nAssistant: "
input_ids = tokenizer(prompt, return_tensors='pt').input_ids.cuda()
generator =  model.generate(inputs=input_ids, temperature=0.7, do_stream=True, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512, stream=True)

#for token in generator:
#    word = tokenizer.decode(token)
#    print(word, end="", flush=True)

stream_result = words = ""
last_tokens = last_decoded_tokens = []

for index, x in enumerate(generator):
    tokens = x.cpu().numpy().tolist()
    tokens = last_tokens + tokens
    word = tokenizer.decode(tokens, skip_special_tokens=True)
    if "�" in word:
        last_tokens = tokens
    else:
        if " " in tokenizer.decode(
            last_decoded_tokens+tokens,skip_special_tokens=True):
            word = " " + word
        last_tokens = []
        last_decoded_tokens = tokens
    stream_result += word
    print(word,end="")