OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
12.52k stars 880 forks source link

生成错误描述 #88

Closed ZHANG-SH97 closed 5 months ago

ZHANG-SH97 commented 5 months ago

问题描述

使用MiniCPM-V2进行图像描述生成任务,模型输出与图像不相关内容,并且可复现。 以下是我本地测试的环境、推理代码以及生成的描述。

推理环境

下面是本地测试中部分安装包版本

torch==2.3.0
torchvision==0.18.0
timm==0.9.16
transformers==4.37.2

推理代码及参数设置

下面是推理时设置的参数

from transformers import AutoModel, AutoTokenizer

model_path = "my-model-path"
model = AutoModel.from_pretrained(
    model_path, trust_remote_code=True, torch_dtype=torch.bfloat16
)
model = model.to(device=self.device, dtype=torch.float16)

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.eval()
res, context, _ = model.chat(
                image=image,
                msgs=msgs,
                context=None,
                tokenizer=self.tokenizer,
                sampling=False,
                temperature=1.0,
                repetition_penalty=1.1,
                length_penalty=0.8,
            )

测试图像

test_img

生成的描述

使用上述环境及推理代码,生成的描述为:

这幅图片展示了一个宁静的湖边景色,周围被郁郁葱葱的绿色植物环绕。湖水清澈见底,倒映着上方湛蓝的天空。在前景中,有一座小木栈道伸向湖中,为整个场景增添了一丝宁静与自然之美。

能够看出模型生成的描述与图像没有关系 同时,实测发现测试600+图像,有大概6%的概率生成类似的描述,且与图像没有关系

请问是哪里出了问题?

raind-cd commented 5 months ago

我遇到了同样问题,对于各种不同画面,以下描述随机高概率出现。

image
iceflame89 commented 5 months ago

感谢反馈,麻烦提供下运行系统,硬件以及具体的prompt,以便我们复现

ZHANG-SH97 commented 5 months ago

感谢反馈,麻烦提供下运行系统,硬件以及具体的prompt,以便我们复现 以下是测试硬件信息: PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.31

Python version: 3.9.12 (main, Jun 1 2022, 11:38:51) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: 12.1.105 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A800 80GB PCIe GPU 1: NVIDIA A800 80GB PCIe GPU 2: NVIDIA A800 80GB PCIe GPU 3: NVIDIA A800 80GB PCIe GPU 4: NVIDIA A800 80GB PCIe GPU 5: NVIDIA A800 80GB PCIe GPU 6: NVIDIA A800 80GB PCIe GPU 7: NVIDIA A800 80GB PCIe

Nvidia driver version: 535.129.03 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 57 bits virtual CPU(s): 112 On-line CPU(s) list: 0-111 Thread(s) per core: 2 Core(s) per socket: 28 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 106 Model name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz Stepping: 6 CPU MHz: 800.000 CPU max MHz: 3100.0000 CPU min MHz: 800.0000 BogoMIPS: 4000.00 Virtualization: VT-x L1d cache: 2.6 MiB L1i cache: 1.8 MiB L2 cache: 70 MiB L3 cache: 84 MiB

测试prompt为:"描述画面内容"

Cuiunbo commented 5 months ago

在2.0版本的训练数据中存在错误描述数据,我们已在2.5版本中解决了这个问题,您可以尝试使用2.5版本~