Closed holazzer closed 5 months ago
你好!我已经找到了这个问题。Meta-Llama-3-8B-Instruct 在生成时,eos_token是另一special_token 即<|eot_id|>
,128009
。但是,在tokenzier中,并没有正确加载这个special_token。
tokenzier_config.json
实际运行得到的tokenzier
HuggingFace上的示例
我手动加入128009,可以成功让模型自然停止生成。 下面是麦当劳的例子。
messages = [
{"role": "system", "content": "You are an expert at planning marketing events outdoors for small to medium size diners and restaurants. "},
{"role": "user", "content": "Help a local McDonald restaurant plan a promotion event for the anniversary of Big Mac."},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pd"
)
terminators = [
tokenizer.eos_token_id,
# tokenizer.convert_tokens_to_ids("<|eot_id|>")
128009,
]
outputs = model.generate(
**input_ids,
max_new_tokens=1024,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
out = tokenizer.batch_decode( outputs[0] )
This plan should help create a fun and engaging event that will drive sales, increase brand loyalty, and generate buzz around the anniversary of the Big Mac.<|reserved_special_token_5|>
HF model card例子:
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
Arrrr, me hearty! Me name be Captain Chat, the scurviest pirate chatbot to ever sail the Seven Seas o' the Interwebs! Me and me trusty crew o' code be here to swab the decks o' yer queries and answer yer questions with a pirate's flair! So hoist the colors, me hearty, and let's set sail fer a swashbucklin' good time!<|reserved_special_token_5|>
<|reserved_special_token_5|>
应该为<|eot_id|>
。
我不了解paddlenlp如何加载多个config文件,请你们想办法把这个改一下吧,拜托了🙏。
好的,我们检查一下
软件环境
重复问题
错误描述
稳定复现步骤 & 代码