What is the "fine-tuning code" for minigpt-v2?

jpWang commented 11 months ago

Hi, thanks for your great work!

I just want to know that does the released fine-tuning code of minigpt-v2 contains all stage1&2&3 training procedures? Or it only contains stage3 training? And will the training code of stage1&2 be released?

Thanks.

hq0709 commented 11 months ago

and what is the format of the fine-tuning dataset? thx!

shisantaibao commented 11 months ago

What tool was used to obtain the bounding box in multitask_conversation.json? Does the training code of minigptv2 only need to train for 1 stage?

junchen14 commented 11 months ago

Hi, thanks for your great work!

I just want to know that does the released fine-tuning code of minigpt-v2 contains all stage1&2&3 training procedures? Or it only contains stage3 training? And will the training code of stage1&2 be released?

Thanks.

we currently only release the third-stage finetuning code.

junchen14 commented 11 months ago

What tool was used to obtain the bounding box in multitask_conversation.json? Does the training code of minigptv2 only need to train for 1 stage?

in the multitask_conversation.json, for refcoco task, we simply use the refcoco/g/+ with their ground-truth annotation on COCO images. For the grounded image caption and detection tasks, we firstly finetune our second-stage model on our filtered flickr dataset on grounded caption and detection tasks, and then generate the annotations for the COCO images selected in multitask_conversation.json

junchen14 commented 11 months ago

and what is the format of the fine-tuning dataset? thx!

you can follow the dataset/README_MINIGPTv2_FINETUNE.md to prepare the data. And the data format for different tasks can be found in each individual file from minigpt4/datasets/datasets

shisantaibao commented 11 months ago

What tool was used to obtain the bounding box in multitask_conversation.json? Does the training code of minigptv2 only need to train for 1 stage?使用什么工具获取multitask_conversation.json中的边界框？ minigptv2的训练代码只需要训练1阶段吗？

in the multitask_conversation.json, for refcoco task, we simply use the refcoco/g/+ with their ground-truth annotation on COCO images. For the grounded image caption and detection tasks, we firstly finetune our second-stage model on our filtered flickr dataset on grounded caption and detection tasks, and then generate the annotations for the COCO images selected in multitask_conversation.json在 multitask_conversation.json 中，对于 refcoco 任务，我们只需使用 refcoco/g/+ 及其在 COCO 图像上的真实注释。对于接地图像标题和检测任务，我们首先在接地标题和检测任务上过滤后的 flickr 数据集上微调我们的第二阶段模型，然后为 multitask_conversation.json 中选择的 COCO 图像生成注释

What tool was used to obtain bbox? For example, flickr dataset and bbox in multitask_conversation.json. If I need to annotate a picture, what tools do I need? I found that the bbox marked with the labelimg tool is inconsistent with the bbox in flickr and multitask_conversation. They are all integers and too small.

Vision-CAIR / MiniGPT-4

What is the "fine-tuning code" for minigpt-v2? #401