Closed jpWang closed 11 months ago
and what is the format of the fine-tuning dataset? thx!
What tool was used to obtain the bounding box in multitask_conversation.json? Does the training code of minigptv2 only need to train for 1 stage?
Hi, thanks for your great work!
I just want to know that does the released fine-tuning code of minigpt-v2 contains all stage1&2&3 training procedures? Or it only contains stage3 training? And will the training code of stage1&2 be released?
Thanks.
we currently only release the third-stage finetuning code.
What tool was used to obtain the bounding box in multitask_conversation.json? Does the training code of minigptv2 only need to train for 1 stage?
in the multitask_conversation.json, for refcoco task, we simply use the refcoco/g/+ with their ground-truth annotation on COCO images. For the grounded image caption and detection tasks, we firstly finetune our second-stage model on our filtered flickr dataset on grounded caption and detection tasks, and then generate the annotations for the COCO images selected in multitask_conversation.json
and what is the format of the fine-tuning dataset? thx!
you can follow the dataset/README_MINIGPTv2_FINETUNE.md to prepare the data. And the data format for different tasks can be found in each individual file from minigpt4/datasets/datasets
What tool was used to obtain the bounding box in multitask_conversation.json? Does the training code of minigptv2 only need to train for 1 stage?使用什么工具获取multitask_conversation.json中的边界框? minigptv2的训练代码只需要训练1阶段吗?
in the multitask_conversation.json, for refcoco task, we simply use the refcoco/g/+ with their ground-truth annotation on COCO images. For the grounded image caption and detection tasks, we firstly finetune our second-stage model on our filtered flickr dataset on grounded caption and detection tasks, and then generate the annotations for the COCO images selected in multitask_conversation.json在 multitask_conversation.json 中,对于 refcoco 任务,我们只需使用 refcoco/g/+ 及其在 COCO 图像上的真实注释。对于接地图像标题和检测任务,我们首先在接地标题和检测任务上过滤后的 flickr 数据集上微调我们的第二阶段模型,然后为 multitask_conversation.json 中选择的 COCO 图像生成注释
What tool was used to obtain bbox? For example, flickr dataset and bbox in multitask_conversation.json. If I need to annotate a picture, what tools do I need? I found that the bbox marked with the labelimg tool is inconsistent with the bbox in flickr and multitask_conversation. They are all integers and too small.
Hi, thanks for your great work!
I just want to know that does the released fine-tuning code of minigpt-v2 contains all stage1&2&3 training procedures? Or it only contains stage3 training? And will the training code of stage1&2 be released?
Thanks.