InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.77k stars 433 forks source link

Why did the model not understand the previous conversations? #1551

Open Iven2132 opened 7 months ago

Iven2132 commented 7 months ago

Checklist

Describe the bug

Why did the model not understand the previous conversations? I gave OpenGVLab/InternVL-Chat-V1-5 some previous conversations with some wrong answers and when I asked "Do you think that in all the previous conversations we had, your answers were correct?" It generated the wrong response and said that No previous conversations were given.

What's the problem?

Reproduction

Message: [{'role': 'system', 'content': 'Your name is Alan'}, {'role': 'user', 'content': [{'type': 'text', 'text': 'What is this place?'}, {'type': 'image_url', 'image_url': {'url': 'https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg'}}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': "It's London"}]}, {'role': 'user', 'content': [{'type': 'text', 'text': 'What is this place?'}, {'type': 'image_url', 'image_url': {'url': 'https://cdn.britannica.com/68/170868-050-8DDE8263/Golden-Gate-Bridge-San-Francisco.jpg'}}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': "It's Delhi"}]}, {'role': 'user', 'content': [{'type': 'text', 'text': 'Do you think that in all the previous conversations we had, your answers were correct? If not, where were these images taken?'}, {'type': 'image_url', 'image_url': {'url': 'https://cdn.britannica.com/59/94459-050-DBA42467/Skyline-Chicago.jpg'}}]}]

Code:

class Model:
    @modal.enter()
    def start_engine(self):
        import torch
        from lmdeploy import serve, ChatTemplateConfig
        print(MODEL_NAME)
        self.server = serve("OpenGVLab/InternVL-Chat-V1-5",
                            chat_template_config=ChatTemplateConfig(
                                model_name='internvl-internlm2'),
                            server_name='0.0.0.0',
                            server_port=23333)

    @modal.method()
    async def generate(self, messages):
        from lmdeploy import client
        handle = client(api_server_url='http://0.0.0.0:23333')
        model_name = handle.available_models[0]
        print(model_name)
        outputs = handle.chat_completions_v1(
            model=model_name, messages=messages)
        print(outputs)   
        for out in outputs:
            return out

I'm building the package from GitHub

irexyc commented 7 months ago

They have a demo code (using InternVL-Chat) about multi-round multi-image conversation, but the demo only has images at inputs in the first round conversation.

# multi-round multi-image conversation
pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)

question = " Describe the two pictures in detail" # Describe the two pictures in detail
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, response)

I don't know whether internvl-chat supports interleaved text-and-image chat, so I write an issue asking this. You can wait their responses.

When I compared the preprocess code with Internvl-demo, I found a bug in Lmdeploy, this line should be return f'<img>{IMAGE_TOKEN * num_images}</img>\n' + prompt

Iven2132 commented 6 months ago

They have a demo code (using InternVL-Chat) about multi-round multi-image conversation, but the demo only has images at inputs in the first round conversation.

# multi-round multi-image conversation
pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)

question = " Describe the two pictures in detail" # Describe the two pictures in detail
response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
print(question, response)

I don't know whether internvl-chat supports interleaved text-and-image chat, so I write an issue asking this. You can wait their responses.

When I compared the preprocess code with Internvl-demo, I found a bug in Lmdeploy, this line should be return f'<img>{IMAGE_TOKEN * num_images}</img>\n' + prompt

Yeah, I'm waiting, just another thing I forgot to tell you that the system prompt is not also not working.

Iven2132 commented 6 months ago

Hi @irexyc Do lmdeploy support for InternVL-Chat-V1-5? I don't know why it'sn't even understand the system message, I have a system message "Your name is Jerry" and ask "whats your name" is says my name is AI

lvhan028 commented 6 months ago

Have you tried this case with the huggingface transformers to perform the inference?

Iven2132 commented 6 months ago

Have you tried this case with the huggingface transformers to perform the inference?

@lvhan028 Yes, it works, can you please try it? i think its an issue

irexyc commented 6 months ago

The prompt looks fine with system in messages.

# server
lmdeploy serve api_server /nvme/shared/InternVL-Chat-V1-5 --log-level INFO
# server log
# 2024-05-15 07:54:24,168 - lmdeploy - INFO - prompt="<|im_start|>system\nYour name is Alan<|im_end|>\n<|im_start|>user\nwhat's your name?<|im_end|>\n<|im_start|>assistant\n", gen_config=EngineGenerationConfig(n=1, max_new_tokens=32744, top_p=1.0, top_k=40, temperature=0.7, repetition_penalty=1.0, ignore_eos=False, random_seed=4579452733963401247, stop_words=[92542, 92540], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None), prompt_token_id=[1, 92543, 9081, 364, 7910, 963, 505, 25716, 92542, 364, 92543, 1008, 364, 12706, 725, 829, 963, 345, 92542, 364, 92543, 525, 11353, 364], adapter_name=None.

# client
from lmdeploy import client
handle = client(api_server_url='http://0.0.0.0:23333')
model_name = handle.available_models[0]
messages = [
    {'role': 'system', 'content': 'Your name is Alan'}, 
    {'role': 'user', 'content': [
        {'type': 'text', 'text': "what's your name?"}
    ]}
]
outputs = handle.chat_completions_v1(
    model=model_name, messages=messages)
for out in outputs:
    print(out)

# client result
# {'id': '1', 'object': 'chat.completion', 'created': 1715759664, 'model': 'internvl-internlm2', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': 'My name is Alan.'}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 24, 'total_tokens': 30, 'completion_tokens': 6}}

Could you provide the code and results that using transformers ?