OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
12.14k stars 849 forks source link

[Q] 数苹果问题 #190

Closed zhudongwork closed 4 months ago

zhudongwork commented 4 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

minicpm2.5用到了llava-UHD的技术,但是测试发现依然回答不了数苹果问题。

image

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

微信图片_20240531161611

运行环境 | Environment

官方在线demo

备注 | Anything else?

No response

Cuiunbo commented 4 months ago

Hello, the question you raise is fascinating. Here's the thing, for llavauhd, he solved two problems,

  1. Input HD original-size images,
  2. No overlapping parts of the input images,

But our model still has some base hallucination, like the objhall benchmark scores as seen in the table. Both Geminipro and gpt4V still have some of these hallucinations, which is a problem worth addressing, we'll keep on looking for a solution, so stay tuned!

zhudongwork commented 4 months ago

Thanks, looking forward your masterpiece~

---原始邮件--- 发件人: "Cui @.> 发送时间: 2024年6月1日(周六) 晚上6:57 收件人: @.>; 抄送: @.**@.>; 主题: Re: [OpenBMB/MiniCPM-V] [Q] 数苹果问题 (Issue #190)

Hello, the question you raise is fascinating. Here's the thing, for llavauhd, he solved two problems,

Input HD original-size images,

No overlapping parts of the input images,

But our model still has some base hallucination, like the objhall benchmark scores as seen in the table. we'll keep on looking for a solution, so stay tuned!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>