InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.58k stars 418 forks source link

[Bug] MiniCPMV-2.6 HF 的推理结果和 lmdeploy 结果不一致 #2377

Open zhjunqin opened 2 months ago

zhjunqin commented 2 months ago

Checklist

Describe the bug

MiniCPMV-2.6 HF 的推理结果和 lmdeploy 结果不一致,复现的时候使用了 top_k = 1 结果仍然不一致。

复现流程

准备数据

wget "https://support.huaweicloud.com/api-ocr/zh-cn_image_0000001698774808.png" 到当前目录文件 zh-cn_image_0000001698774808.png

LMdeploy 复现方式

  1. 使用镜像 openmmlab/lmdeploy:v0.6.0a0-cu12
  2. 使用如下代码调用
from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig

from lmdeploy import pipeline
from lmdeploy.vl import load_image

backend_config = TurbomindEngineConfig(max_batch_size=1, cache_max_entry_count=0.4)

pipe = pipeline('/data/models/openbmb/MiniCPM-V-2_6/', log_level='INFO', backend_config=backend_config)

image_path = "zh-cn_image_0000001698774808.png"

prompt = "请详细识别图中的内容并以 markdown 格式返回"
messages = [
    dict(role='user', content=[
        dict(type='text', text=prompt),
        dict(type='image_url', image_url=dict(url=image_path)),
    ])
]

gen_config = GenerationConfig(top_p=1, top_k=1, temperature=0.1, repetition_penalty=1.05, max_new_tokens=4096)
out = pipe(messages, gen_config=gen_config)
print(out.text)

模型输出

这张图片展示了一份门诊检验报告单,具体内容如下:

**标题:**
门诊检验报告单

**副标题:**
血常规(5分类)

**状态说明:**
标本状态:正常

**临床诊断:**
1. 慢性扁桃体炎

**检验项目列表及结果:**
- 中性细胞百分率 (NEL%):77.1%
  - 参考范围:40-75%
- 淋巴细胞百分率 (LYM%):8.8%
  - 参考范围:20-50%
- 单核细胞百分率 (MONO%):7.1%
  - 参考范围:3.0-10.0%
- 红细胞计数 (RBC):6.66
  - 参考范围:4.3-5.8%

**签名区域:**
送检医生:
检验者:
审核者:

这份报告详细列出了患者的血常规检查结果,并根据参考范围对各项指标进行了评估。

HF 复现方式

代码:

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer
import os
import base64
import httpx

model = AutoModel.from_pretrained('/data/models/openbmb/MiniCPM-V-2_6/', trust_remote_code=True,
    attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('/data/models/openbmb/MiniCPM-V-2_6/', trust_remote_code=True)

def chat_llm(image_path, prompt):
    image = Image.open(image_path).convert('RGB')
    message = [{'role': 'user', 'content': [image, prompt]}]
    res = model.chat(
        image=None,
        msgs=message,
        tokenizer=tokenizer,
        temperature=0.1,
        top_p = 1,
        top_k = 1,
        do_sample=True,
        repetition_penalty=1.05,
    )
    print(res)

print("==============")
prompt = "请详细识别图中的内容并以 markdown 格式返回"
chat_llm("zh-cn_image_0000001698774808.png", prompt)

模型输出:

这张图片展示了一份门诊检验报告单,具体内容如下:

**标题:门诊检验报告单**

**副标题:血常规(5分类)**

**标本状态:正常**

**临床诊断:1.慢性扁桃体炎**

| 检验项目 | 结果 | 参考范围 | 单位 |
|---------|------|----------|------|
| 中性细胞百分率 (NEL%) | 77.1 | 40-75 | % |
| 淋巴细胞百分率 (LYM%) | 8.8 | 20-50 | % |
| 单核细胞百分率 (MONO%) | 7.1 | 3.0-10.0 | % |
| 红细胞计数 (RBC) | 6.66 | 4.3-5.8 | % |

**送检医生:**
[空白]

**检验者:**
[空白]

**审核者:**
[空白]

Reproduction

如上所示

Environment

如上所示

Error traceback

No response

zhjunqin commented 2 months ago

对比

LMDeploy 的日志

prompt='<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<image_id>0</image_id><image><IMAGE_TOKEN></image><slice><IMAGE_TOKEN></slice><slice><IMAGE_TOKEN></slice>\n<slice><IMAGE_TOKEN></slice><slice><IMAGE_TOKEN></slice>\n请详细识别图中的内容并以 markdown 格式返回<|im_end|>\n<|im_start|>assistant\n', 
gen_config=EngineGenerationConfig(n=1, max_new_tokens=8192, top_p=1.0, top_k=1, temperature=0.1, repetition_penalty=1.05, ignore_eos=False, random_seed=4988746044838101047, stop_words=[151645], bad_words=None, min_new_tokens=None, skip_special_tokens=True, logprobs=None, logits_processors=None),
prompt_token_id=[151644, 8948, 198, 2610, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 151658, 15, 151659, 151646, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151647, 151656, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151657, 151656, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151657, 198, 151656, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151657, 151656, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151657, 198, 14880, 100700, 102450, 28029, 101047, 43815, 62926, 23031, 50494, 51461, 120, 28330, 31526, 151645, 198, 151644, 77091, 198]

HF 的 debug 信息

input_ids: tensor([[151644,   8948,    198,   2610,    525,    264,  10950,  17847,     13,
         151645,    198, 151644,    872,    198, 151658,     15, 151659, 151646,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 151647, 151656, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 151657, 151656, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 151657,    198,
         151656, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 151657, 151656, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244, 128244,
         128244, 128244, 128244, 128244, 128244, 151657,    198,  14880, 100700,
         102450,  28029, 101047,  43815,  62926,  23031,  50494,  51461,    120,
          28330,  31526, 151645,    198, 151644,  77091,    198]],
       device='cuda:0', dtype=torch.int32)

对比

除了 128244 这个 ID 外,其他的 ID 都一样

lvhan028 commented 2 months ago

This model is supported in lmdeploy v0.6.0a0 May upgrade to the latest version

zhjunqin commented 2 months ago

This model is supported in lmdeploy v0.6.0a0 May upgrade to the latest version

I just tested on image openmmlab/lmdeploy:v0.6.0a0-cu12, the problem still exists.

lvhan028 commented 2 months ago

We use 0 as the placeholder for the image embedding.

irexyc commented 2 months ago

@zhjunqin

是想结果完全一样么?这个感觉不太可能。之前支持mimicpm-v-2.6的时候,我从hf那边拿到输入的embedding让lmdeploy推理,不开sampling的情况下,结果也是有差异的,这里认为是不同的kernel造成的误差。

另外除了kernel的实现有差异,lmdeploy这边vision推理是用的数据类型是float16,和你贴的hf复现方式中的类型bfloat16也会造成一些差别。不过就像上面说的,即使是相同的输入(embedding),两边的结果也会有一些差异。

你比较了input_ids,0和128244是两边不同的占位符,在embedding这个阶段都会用图像特征来替代的,如果只有这个数字有差异,也说明了input_ids两边是对齐的。

zhjunqin commented 2 months ago

@zhjunqin

是想结果完全一样么?这个感觉不太可能。之前支持mimicpm-v-2.6的时候,我从hf那边拿到输入的embedding让lmdeploy推理,不开sampling的情况下,结果也是有差异的,这里认为是不同的kernel造成的误差。

另外除了kernel的实现有差异,lmdeploy这边vision推理是用的数据类型是float16,和你贴的hf复现方式中的类型bfloat16也会造成一些差别。不过就像上面说的,即使是相同的输入(embedding),两边的结果也会有一些差异。

你比较了input_ids,0和128244是两边不同的占位符,在embedding这个阶段都会用图像特征来替代的,如果只有这个数字有差异,也说明了input_ids两边是对齐的。

对,期望得到一样的确定性结果。

嗯,我对比 input_ids 想表达的也是,输入的 input_ids 两者是一致的。

根据你的分析来看,两边本身就是无法对其了,是么?而且我发现,特别是当模型输出文本越长,到后面,两者之间的差异越大。

irexyc commented 2 months ago

@zhjunqin

是这样的,greedy decoding的情况下,一般前面还能保持一致,一旦某个位置出现差异的话,就相当于从这里开始输入就不一致了,后面一样的概率更小了。