Closed chailt closed 12 months ago
@AllentDan may check the chat template at first
那我应该如何修改对话模板,或者是添加什么参数
将 lmdeploy 里对话模板适配你想要的就可以啦。改 llama2 的 decorate_prompt
函数,让他的输出和 Chinese-LLaMA-Alpaca-2/scripts/inference/inference_hf.py 拼的 prompt 一致
hi, I have the same problem. I'm trying to deploy this model: https://huggingface.co/upstage/SOLAR-0-70b-16bit. I added a new class in model.py as shown below:
@MODELS.register_module(name='qwen-7b')
class Qwen7BChat(BaseModel):
"""Chat template for Qwen-7B-Chat."""
def __init__(self,
session_len=8192,
top_p=0.5,
top_k=40,
temperature=1.0,
im_start='<|im_start|>',
im_end='<|im_end|>',
system='You are a helpful assistant.',
**kwargs):
super().__init__(**kwargs)
self.session_len = session_len
self.top_p = top_p
self.top_k = top_k
self.temperature = temperature
self.im_start = im_start
self.im_end = im_end
self.system = system
def decorate_prompt(self, prompt, sequence_start=True):
assert self.capability == 'chat', \
f'{type(self).__name__} has no capability of {self.capability}'
if sequence_start:
return f'{self.im_start}system\n{self.system}{self.im_end}' \
f'\n{self.im_start}user\n{prompt}{self.im_end}' \
f'\n{self.im_start}assistant\n'
return f'\n{self.im_start}user\n{prompt}{self.im_end}' \
f'\n{self.im_start}assistant\n'
also changed all "model_name" in the config files under llama2/solar-70b-w4/workspace
to "solar".
after making these changes, the results are still the same, just repetition of one word.
I've asked many times about this on WeChat. Could you please provide a reliable and effective solution to the problem? @AllentDan @lvhan028
@AllentDan Please add chat templates for chinese-llama-alpaca and solar. Then check if the repetition issue still exists
just wonder if there's any update regarding this issue. thanks! @lvhan028 @AllentDan
I was busy with other stuff these days. Once there is any news, I will share it with you.
@chailt Hi, current LMDeploy only supports "rope_scaling": null,
for huggingface model. Linear rope_scaling
will be supported later. #536
@ghbtest There is nothing special about SOLAR. You were using the wrong template. This is the right template for solar:
@MODELS.register_module(name='solar')
class SOLAR(BaseModel):
"""Chat template of SOLAR model."""
def __init__(
self,
b_sys='### System:\n',
e_sys='\n\n',
boh='### User:\n',
eoh='\n\n',
boa='### Assistant:\n',
eoa='\n\n',
system='',
session_len=2048,
**kwargs):
super().__init__(**kwargs)
self.b_sys = b_sys
self.e_sys = e_sys
self.boh = boh
self.eoh = eoh
self.boa = boa
self.eoa = eoa
self.system = system
self.session_len = session_len
def decorate_prompt(self, prompt, sequence_start=True):
"""Return the prompt that is concatenated with other elements in the
chat template.
Args:
prompt (str): user's input prompt
sequence_start (bool): indicator for the first round chat of a
session sequence
Returns:
str: the concatenated prompt
"""
assert self.capability == 'chat', \
f'{type(self).__name__} has no capability of {self.capability}'
if sequence_start:
return f'{self.b_sys}{self.system}{self.e_sys}' \
f'{self.boh}{prompt}{self.eoh}{self.boa}'
return f'{self.boh}{prompt} {self.eoh}{self.boa}'
def messages2prompt(self, messages, sequence_start=True):
"""Return the prompt that is concatenated with other elements in the
chat template.
Args:
messages (str | List): user's input prompt
Returns:
str: the concatenated prompt
"""
if isinstance(messages, str):
return self.get_prompt(messages, sequence_start)
system, users, assistants = self._translate_messages(messages)
system = self.system if not system else system
ret = f'{self.b_sys}{system}{self.e_sys}'
for i, (user, assistant) in enumerate(zip(users, assistants)):
ret += f'{self.boh}{user}{self.eoh}{self.boa}'
if assistant:
ret += f'{assistant}{self.eoa}'
return ret
@AllentDan thanks! I'll try it later. any other change I need to make? do I need to change "model_name" in the config files under llama2/solar-70b-w4/workspace to "solar"?
python3 -m lmdeploy.serve.turbomind.deploy solar SOLAR-0-70b-16bit --tp 8
python3 lmdeploy/turbomind/chat.py ./workspace --tp 8
The name solar
is registered in model.py if you put the above codes in it.
@chailt Hi, current LMDeploy only supports
"rope_scaling": null,
for huggingface model. Linearrope_scaling
will be supported later. #536
@AllentDan @lvhan028 Linear rope_scaling
这种机制请问现在有更新了吗,目前市面上好多模型都上这个机制支持长文本了
turbomind现在实现的外推方式是参考qwen的。 workspace/triton_models/weights/config.ini中,如下的配置需要修改:
max_position_embeddings = xxxx
use_dynamic_ntk = 1
use_logn_attn = 1
对于 max_position_embeddings 的取值,请参考https://huggingface.co/docs/transformers/main/en/model_doc/llama#transformers.LlamaConfig.max_position_embeddings
@chailt chinese-llama-aplaca-2 模型,在用 deploy.py 转换时,model_name 不要用 vicuna,而要用 llama2
@lvhan028 感谢回复!qwen这种动态外推方式,如果max_position_embeddings =4096,那可以达到多少的外推长度呢
印象中是能外推到4倍
Checklist
- [ ] 1. I have searched related issues but cannot get the expected help.
- [ ] 2. The bug has not been fixed in the latest version.
Describe the bug
回答出现重复乱码问题 基础模型使用的这个,llama2,13b,16k长度的 https://github.com/ymcui/Chinese-LLaMA-Alpaca-2
Reproduction
python -m lmdeploy.serve.turbomind.deploy vicuna /home/ubuntu/checkpoint-1400/ --tp 2 python3 -m lmdeploy.serve.openai.api_server ./workspace 0.0.0.0 8011 --instance_num 32 --tp 2
Error traceback
No response
我也碰到同样的问题,您是否有解决这个问题呢
Chinese-LLaMA-Alpaca-2 这个模型的对话模板(chat prompt ttemplate)是怎样的?
Chinese-LLaMA-Alpaca-2 这个模型的对话模板(chat prompt ttemplate)是怎样的?
转模型的时候,--model-name 要填写 llama2,不要填写 vicuna
/root/miniconda3/envs/myenv/bin/python3 -m lmdeploy serve api_server InternVL2-2B-AWQ --model-name InternVL2-2B-AWQ --cache-max-entry-count 0.1 --model-format awq --server-port 8083 --chat-template chat_template.json
回答10次,9次正常速度也快。但是 有一次 回答重复。
ChatCompletion(id='5', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='该图片显示了一张营业执照,详细信息如下:\n\n- 名称:布恩农牧科技有限公司\n- 类型:有限责任公司(自然人投资或控股)\n- 住所:营南县县岭泉镇刘子村\n- 法定代表人:于某\n- 注册资本:伍仟万元整\n- 成立日期:2011年08月01日\n- 营业期限:2011年08月01日至2041年08月01日\n- 经营范围:销售:饲料及饲料原料;食品添加剂;饲料添加剂;饲料配方颗粒;饲料添加剂;动物性饲料添加剂;饲料添加剂;预包装饲料;农业信息技术开发、技术服务、技术咨询、技术推广、技术交流、技术转让、技术咨询、技术推广、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术', role='assistant', function_call=None, tool_calls=None))], created=1721526408, model='InternVL2-2B-AWQ', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=5187, prompt_tokens=3389, total_tokens=8576))
设置下 repetition_penalty=1.02 试试。
https://github.com/InternLM/InternLM/issues/758#issuecomment-2210313772 InternLM2 模型偶尔有重复性回答的现象。 InternVL2-2B使用的是 InternLM2 模型,怀疑和它有关
Checklist
Describe the bug
回答出现重复乱码问题 基础模型使用的这个,llama2,13b,16k长度的 https://github.com/ymcui/Chinese-LLaMA-Alpaca-2
Reproduction
python -m lmdeploy.serve.turbomind.deploy vicuna /home/ubuntu/checkpoint-1400/ --tp 2 python3 -m lmdeploy.serve.openai.api_server ./workspace 0.0.0.0 8011 --instance_num 32 --tp 2
Error traceback
No response