OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
Apache License 2.0
7.84k stars 543 forks source link

MiniCPM-V Finetuning for multi-image input during a multi-turn conversation💡 [REQUEST] - <title> #233

Open rookie-joe opened 4 weeks ago

rookie-joe commented 4 weeks ago

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

for multi-image input during a multi-turn conversation

摘要 | Summary

for multi-image input during a multi-turn conversation

基本示例 | Basic Example

[ { "id": "0", "image": { 'image1': 'path/to/image_0.jpg', 'image2': 'path/to/image_1.jpg' } , "conversations": [ { 'role': 'user', 'content': '\<image1> \nHow many desserts are on the white plate?' }, { 'role': 'assistant', 'content': 'There are three desserts on the white plate.' },
{ 'role': 'user', 'content': '\<image2> \nWhat type of desserts are they?' }, { 'role': 'assistant', 'content': 'The desserts are cakes with bananas and pecans on top. They share similarities with donuts, but the presence of bananas and pecans differentiates them.' }, { 'role': 'user', 'content': 'What is the setting of the image?'}, { 'role': 'assistant', 'content': 'The image is set on a table top with a plate containing the three desserts.' }, ] }, ]

缺陷 | Drawbacks

multi-image input during a multi-turn conversation is importance since multi-turn shall involve both text and image

未解决问题 | Unresolved questions

multi-image input during a multi-turn conversation

JinQiangWang2021 commented 3 weeks ago

@rookie-joe Hi , Have you solve this question? I also finetune this model,but I meet loading the data?

rookie-joe commented 3 weeks ago

not yet, but I notice they show a multi-image during multi-turn example during inference..... I will look if they do not respond by the end of the week.

Uooga commented 1 week ago

Hello, have you solved this question?

LDLINGLINGLING commented 1 day ago

Currently we do not have the ability to pre-train multiple images