自定义fine-tune InternVL-Chat-V1.2 on a custom dataset，代码有问题？

yanzaaaasa commented 5 months ago

按这个加载数据 { "sharegpt4v_instruct_gpt4-vision_cap100k": { "root": "playground/data/", "annotation": "playground/sharegpt4v_instruct_gpt4-vision_cap100k.jsonl", "data_augment": false, "repeat_time": 1, "length": 102025 } } 提供的标注文件，其中VG的数据集格式如下 {"id": "VG_100K_2/57", "image": "vg/VG_100K_2/57.jpg", "conversations": [{"from": "human", "value": "\nPlease provide a short description for this region: [0.04, 0.35, 0.27, 0.43]."}, {"from": "gpt", "value": "Red and white sign."}, {"from": "human", "value": "Please provide a short description for this region: [0.85, 0.54, 0.9, 0.62]."}, {"from": "gpt", "value": "Woman wearing a skirt."}, {"from": "human", "value": "Please provide a short description for this region: [0.63, 0.38, 0.7, 0.48]."}, {"from": "gpt", "value": "Window in the building."}, {"from": "human", "value": "Please provide a short description for this region: [0.5, 0.5, 0.69, 0.8]."}, {"from": "gpt", "value": "He is crossing the street."}, {"from": "human", "value": "Please provide a short description for this region: [0.54, 0.52, 0.65, 0.75]."}, {"from": "gpt", "value": "A man in a suit."}, {"from": "human", "value": "Please provide a short description for this region: [0.47, 0.55, 0.73, 0.65]."}, {"from": "gpt", "value": "A car driving on the street."}, {"from": "human", "value": "Please provide the bounding box coordinate of the region this sentence describes: two women walking in the sidewalk."}, {"from": "gpt", "value": "[0.28, 0.49, 0.39, 0.67]"}, {"from": "human", "value": "Please provide a short description for this region: [0.17, 0.6, 0.21, 0.67]."}, {"from": "gpt", "value": "A black short pole on the sidewalk."}]}

另外从refcoco的验证代码来看，提供的prompt是Please provide the bounding box coordinate of the region this sentence describes: second brown banana from the right 且推理返回结果 giraffe in middle [[331, 160, 703, 751]] ，明显和微调数据集的标注有差异。请问一下直接按微调代码中读出的数据微调模型是不会有问题？

yanzaaaasa commented 5 months ago

ref 和 box 显示不出来，去掉特殊符号< 另外从refcoco的验证代码来看，提供的prompt是Please provide the bounding box coordinate of the region this sentence describes: ref brown bananas on far right ref 且推理返回结果 ref bowl of carrots ref box [[231, 336, 947, 1000]] box ，明显和微调数据集的标注有差异。请问一下直接按微调代码中读出的数据微调模型是不会有问题？

Weiyun1025 commented 3 months ago

您好，我们的区域感知数据的标准格式是：<ref>文本</ref><box>[[x1, y1, x2, y2]]</box>，不过为了提升模型的泛化能力，这这个格式并不是完全严格的

例如对于VG数据，其格式为：Please provide a short description for this region: <box>[[x1, y1, x2, y2]]</box>，如果您希望用Please provide a short description for this <ref>region</ref>: <box>[[x1, y1, x2, y2]]</box>的格式训练也是完全ok的，对最终性能影响不会特别大

qinb commented 2 months ago

ref 和 box 显示不出来，去掉特殊符号< 另外从refcoco的验证代码来看，提供的prompt是Please provide the bounding box coordinate of the region this sentence describes: ref brown bananas on far right ref 且推理返回结果 ref bowl of carrots ref box [[231, 336, 947, 1000]] box ，明显和微调数据集的标注有差异。请问一下直接按微调代码中读出的数据微调模型是不会有问题？

@czczup @Weiyun1025 @yanzaaaasa 我现在遇到类似的问题，我用的是及的格式，结果模型训练完，微调Internvl2-1B模型，推理的时候，模型输出就没有了和我的ref是数字，导致两者就粘连到一起了，请问如何解决？

OpenGVLab / InternVL

自定义fine-tune InternVL-Chat-V1.2 on a custom dataset，代码有问题？ #211