InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.45k stars 402 forks source link

回复重复[Bug] #521

Closed chailt closed 12 months ago

chailt commented 1 year ago

Checklist

Describe the bug

回答出现重复乱码问题 基础模型使用的这个,llama2,13b,16k长度的 https://github.com/ymcui/Chinese-LLaMA-Alpaca-2 image E69E2DEB-943B-4b71-B8D6-CDCAF53C1EEF

Reproduction

python -m lmdeploy.serve.turbomind.deploy vicuna /home/ubuntu/checkpoint-1400/ --tp 2 python3 -m lmdeploy.serve.openai.api_server ./workspace 0.0.0.0 8011 --instance_num 32 --tp 2

Error traceback

No response

lvhan028 commented 1 year ago

@AllentDan may check the chat template at first

AllentDan commented 1 year ago

对话模板是不一致。https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/blob/09eadc6cf4c06edc007674bea8eb2788fa3d2a77/scripts/inference/inference_hf.py#L4

https://github.com/InternLM/lmdeploy/blob/b58a9dffb12bdc36ee9b3f251185179602468de9/lmdeploy/model.py#L394

chailt commented 1 year ago

那我应该如何修改对话模板,或者是添加什么参数

AllentDan commented 1 year ago

将 lmdeploy 里对话模板适配你想要的就可以啦。改 llama2 的 decorate_prompt 函数,让他的输出和 Chinese-LLaMA-Alpaca-2/scripts/inference/inference_hf.py 拼的 prompt 一致

ghbtest commented 1 year ago

hi, I have the same problem. I'm trying to deploy this model: https://huggingface.co/upstage/SOLAR-0-70b-16bit. I added a new class in model.py as shown below:

@MODELS.register_module(name='qwen-7b')
class Qwen7BChat(BaseModel):
    """Chat template for Qwen-7B-Chat."""

    def __init__(self,
                 session_len=8192,
                 top_p=0.5,
                 top_k=40,
                 temperature=1.0,
                 im_start='<|im_start|>',
                 im_end='<|im_end|>',
                 system='You are a helpful assistant.',
                 **kwargs):
        super().__init__(**kwargs)
        self.session_len = session_len
        self.top_p = top_p
        self.top_k = top_k
        self.temperature = temperature

        self.im_start = im_start
        self.im_end = im_end
        self.system = system

    def decorate_prompt(self, prompt, sequence_start=True):
        assert self.capability == 'chat', \
            f'{type(self).__name__} has no capability of {self.capability}'
        if sequence_start:
            return f'{self.im_start}system\n{self.system}{self.im_end}' \
                   f'\n{self.im_start}user\n{prompt}{self.im_end}' \
                   f'\n{self.im_start}assistant\n'

        return f'\n{self.im_start}user\n{prompt}{self.im_end}' \
               f'\n{self.im_start}assistant\n'

also changed all "model_name" in the config files under llama2/solar-70b-w4/workspace to "solar". after making these changes, the results are still the same, just repetition of one word. I've asked many times about this on WeChat. Could you please provide a reliable and effective solution to the problem? @AllentDan @lvhan028

lvhan028 commented 1 year ago

@AllentDan Please add chat templates for chinese-llama-alpaca and solar. Then check if the repetition issue still exists

ghbtest commented 1 year ago

just wonder if there's any update regarding this issue. thanks! @lvhan028 @AllentDan

AllentDan commented 1 year ago

I was busy with other stuff these days. Once there is any news, I will share it with you.

AllentDan commented 1 year ago

@chailt Hi, current LMDeploy only supports "rope_scaling": null, for huggingface model. Linear rope_scaling will be supported later. #536

AllentDan commented 1 year ago

@ghbtest There is nothing special about SOLAR. You were using the wrong template. This is the right template for solar:

@MODELS.register_module(name='solar')
class SOLAR(BaseModel):
    """Chat template of SOLAR model."""

    def __init__(
            self,
            b_sys='### System:\n',
            e_sys='\n\n',
            boh='### User:\n',
            eoh='\n\n',
            boa='### Assistant:\n',
            eoa='\n\n',
            system='',
            session_len=2048,
            **kwargs):
        super().__init__(**kwargs)
        self.b_sys = b_sys
        self.e_sys = e_sys
        self.boh = boh
        self.eoh = eoh
        self.boa = boa
        self.eoa = eoa
        self.system = system
        self.session_len = session_len

    def decorate_prompt(self, prompt, sequence_start=True):
        """Return the prompt that is concatenated with other elements in the
        chat template.

        Args:
            prompt (str): user's input prompt
            sequence_start (bool): indicator for the first round chat of a
               session sequence
        Returns:
            str: the concatenated prompt
        """
        assert self.capability == 'chat', \
            f'{type(self).__name__} has no capability of {self.capability}'
        if sequence_start:
            return f'{self.b_sys}{self.system}{self.e_sys}' \
                   f'{self.boh}{prompt}{self.eoh}{self.boa}'

        return f'{self.boh}{prompt} {self.eoh}{self.boa}'

    def messages2prompt(self, messages, sequence_start=True):
        """Return the prompt that is concatenated with other elements in the
        chat template.

        Args:
            messages (str | List): user's input prompt
        Returns:
            str: the concatenated prompt
        """
        if isinstance(messages, str):
            return self.get_prompt(messages, sequence_start)
        system, users, assistants = self._translate_messages(messages)
        system = self.system if not system else system
        ret = f'{self.b_sys}{system}{self.e_sys}'
        for i, (user, assistant) in enumerate(zip(users, assistants)):
            ret += f'{self.boh}{user}{self.eoh}{self.boa}'
            if assistant:
                ret += f'{assistant}{self.eoa}'
        return ret
ghbtest commented 1 year ago

@AllentDan thanks! I'll try it later. any other change I need to make? do I need to change "model_name" in the config files under llama2/solar-70b-w4/workspace to "solar"?

AllentDan commented 1 year ago
python3 -m lmdeploy.serve.turbomind.deploy solar SOLAR-0-70b-16bit --tp 8
python3 lmdeploy/turbomind/chat.py ./workspace --tp 8

The name solar is registered in model.py if you put the above codes in it.

Vincent131499 commented 12 months ago

@chailt Hi, current LMDeploy only supports "rope_scaling": null, for huggingface model. Linear rope_scaling will be supported later. #536

@AllentDan @lvhan028 Linear rope_scaling这种机制请问现在有更新了吗,目前市面上好多模型都上这个机制支持长文本了

lvhan028 commented 12 months ago

turbomind现在实现的外推方式是参考qwen的。 workspace/triton_models/weights/config.ini中,如下的配置需要修改:

max_position_embeddings = xxxx
use_dynamic_ntk = 1
use_logn_attn = 1

对于 max_position_embeddings 的取值,请参考https://huggingface.co/docs/transformers/main/en/model_doc/llama#transformers.LlamaConfig.max_position_embeddings

lvhan028 commented 12 months ago

@chailt chinese-llama-aplaca-2 模型,在用 deploy.py 转换时,model_name 不要用 vicuna,而要用 llama2

Vincent131499 commented 12 months ago

@lvhan028 感谢回复!qwen这种动态外推方式,如果max_position_embeddings =4096,那可以达到多少的外推长度呢

lvhan028 commented 12 months ago

印象中是能外推到4倍

zhangjiawei5911 commented 11 months ago

Checklist

  • [ ] 1. I have searched related issues but cannot get the expected help.
  • [ ] 2. The bug has not been fixed in the latest version.

Describe the bug

回答出现重复乱码问题 基础模型使用的这个,llama2,13b,16k长度的 https://github.com/ymcui/Chinese-LLaMA-Alpaca-2 image E69E2DEB-943B-4b71-B8D6-CDCAF53C1EEF

Reproduction

python -m lmdeploy.serve.turbomind.deploy vicuna /home/ubuntu/checkpoint-1400/ --tp 2 python3 -m lmdeploy.serve.openai.api_server ./workspace 0.0.0.0 8011 --instance_num 32 --tp 2

Error traceback

No response

我也碰到同样的问题,您是否有解决这个问题呢

lvhan028 commented 11 months ago

Chinese-LLaMA-Alpaca-2 这个模型的对话模板(chat prompt ttemplate)是怎样的?

zhangjiawei5911 commented 11 months ago

Chinese-LLaMA-Alpaca-2 这个模型的对话模板(chat prompt ttemplate)是怎样的?

https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/blob/main/scripts/inference/inference_hf.py#325,这个地方我没有使用模板,是直接将query输进去

lvhan028 commented 11 months ago

转模型的时候,--model-name 要填写 llama2,不要填写 vicuna

sunjunlishi commented 2 months ago

/root/miniconda3/envs/myenv/bin/python3 -m lmdeploy serve api_server InternVL2-2B-AWQ --model-name InternVL2-2B-AWQ --cache-max-entry-count 0.1 --model-format awq --server-port 8083 --chat-template chat_template.json

回答10次,9次正常速度也快。但是 有一次 回答重复。

ChatCompletion(id='5', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='该图片显示了一张营业执照,详细信息如下:\n\n- 名称:布恩农牧科技有限公司\n- 类型:有限责任公司(自然人投资或控股)\n- 住所:营南县县岭泉镇刘子村\n- 法定代表人:于某\n- 注册资本:伍仟万元整\n- 成立日期:2011年08月01日\n- 营业期限:2011年08月01日至2041年08月01日\n- 经营范围:销售:饲料及饲料原料;食品添加剂;饲料添加剂;饲料配方颗粒;饲料添加剂;动物性饲料添加剂;饲料添加剂;预包装饲料;农业信息技术开发、技术服务、技术咨询、技术推广、技术交流、技术转让、技术咨询、技术推广、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术交流、技术咨询、技术', role='assistant', function_call=None, tool_calls=None))], created=1721526408, model='InternVL2-2B-AWQ', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=5187, prompt_tokens=3389, total_tokens=8576))

AllentDan commented 2 months ago

设置下 repetition_penalty=1.02 试试。

lvhan028 commented 2 months ago

https://github.com/InternLM/InternLM/issues/758#issuecomment-2210313772 InternLM2 模型偶尔有重复性回答的现象。 InternVL2-2B使用的是 InternLM2 模型,怀疑和它有关