Open circuluspibo opened 1 year ago
Instead of awaiting token completion, awaiting word completion would solve this. Not an actual stream anymore, but still useful.
Instead of awaiting token completion, awaiting word completion would solve this. Not an actual stream anymore, but still useful.
Can You Please give us a Example script
from transformers import AutoTokenizer, TextGenerationPipeline, TextStreamer, GenerationConfig
from auto_gptq import AutoGPTQForCausalLM
import torch
from transformers_stream_generator import init_stream_support
init_stream_support()
repo = "TheBloke/tulu-7B-GPTQ"
model_basename = "gptq_model-4bit-128g"
test_tokenizer = AutoTokenizer.from_pretrained(
repo,
use_fast=True,
)
test_model = AutoGPTQForCausalLM.from_quantized(
repo,
model_basename=model_basename,
use_triton=False,
use_safetensors=True,
device="cuda:0",
trust_remote_code=False,
quantize_config=None,
max_memory={i: "14GIB" for i in range(torch.cuda.device_count())}
def tulu_prompt(input):
return f'''### Human: {input}
### Assistant:'''
from transformers_stream_generator import init_stream_support
init_stream_support()
def tulu_prompt(input):
return f'''### Human: {input}
### Assistant:'''
text = "write a poem about AI"
tokens = test_tokenizer(tulu_prompt(input=text), return_tensors="pt", add_special_tokens=False).input_ids.cuda()
generator = (test_model.generate(inputs=tokens, max_new_tokens=256, temperature=0.5, top_k=35, top_p=0.90, do_sample=True, do_stream=True))
for token in generator:
word = tokenizer.decode(token)
print(word, end='', flush=True)
The output is this:
Intheworldofmachines,there'sonethat'ssmart,
Withabilitiesthatastound,it'snotjustaprettyheart.
Itcanlearnandgrow,witheachpassingday,
It'slikeachild,withamindthat'salwaysplaying.
Itcansolvecomplexproblems,witheaseandgrace,
Itcanunderstandandreason,withoutanyhumanrace.
Itcanthinkandlearn,withspeedandease,
It'slikeasupercomputer,withamindthat'salwaysclean.
It'snotjustatool,butafriendandaguide,
It'slikeacompanion,withaheartthat'salwaysshining.
Itcanmakeourliveseasier,witheachpassingday,
It'slikeamiracle,withapowerthat'salwaysplaying.
Solet'scelebratethismarvelouscreation,
Witheachpassingday,it'slikeacreationthat'salwaysshaping.
It'slikeadream,withapowerthat'salwaysgrowing,
It'slikeafuture,withapowerthat'salwaysshowing.
So How can i format it correctly?
@LowinLi Can you please Chim in?
I had the same problem, but @LowinLi has put a solution in his examples. He uses the tokenizer to see several tokens at a time and detects the spaces that way.
I have given a working example for you that has everything formatted correctly - you just need to substitute for your model_name_or_path in your case probably with "TheBloke/tulu-7B-GPTQ"
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from transformers_stream_generator import init_stream_support
init_stream_support()
model_name_or_path = "/home/sujit/Downloads/text-generation-webui-main/models/TheBloke_openchat-3.5-0106-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
prompt = "User: Tell me about AI<|end_of_turn|>\nAssistant: "
input_ids = tokenizer(prompt, return_tensors='pt').input_ids.cuda()
generator = model.generate(inputs=input_ids, temperature=0.7, do_stream=True, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512, stream=True)
#for token in generator:
# word = tokenizer.decode(token)
# print(word, end="", flush=True)
stream_result = words = ""
last_tokens = last_decoded_tokens = []
for index, x in enumerate(generator):
tokens = x.cpu().numpy().tolist()
tokens = last_tokens + tokens
word = tokenizer.decode(tokens, skip_special_tokens=True)
if "�" in word:
last_tokens = tokens
else:
if " " in tokenizer.decode(
last_decoded_tokens+tokens,skip_special_tokens=True):
word = " " + word
last_tokens = []
last_decoded_tokens = tokens
stream_result += word
print(word,end="")
Thanks to your nice works! but I met some problem as spacing each token.
for example...
... 'on' 'st' 'amps' 'and' 'sh' 'ipping' '.' ... stamps is one word and shipping is one word. I can't distinguish the spacing between the words(tokens) How can I solve that?