InternLM / InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
1.91k stars 120 forks source link

grounding数据 #319

Closed sxlyiyiyi closed 3 weeks ago

sxlyiyiyi commented 1 month ago

你好,如果想微调自己的rec和reg数据,box或者point的组织格式应该是什么样的,是类似于这样的[0.133, 0234, 0.453, 0.732]吗

WeiminLee commented 3 weeks ago

参考下面的: { "id": "20688", "image": [ "/data/lwm-data/resized_without_padding_images/9e7e2a0e3102aaca2cfba8ffe1182de4.png" ], "conversations": [ { "from": "user", "value": "In, can you guide me to the location of \"Financial Services Guide (PDF)\" by providing bounding boxes?" }, { "from": "assistant", "value": "The bounding box is [729, 0, 810, 25]" } ] },

sxlyiyiyi commented 3 weeks ago

参考下面的: { "id": "20688", "image": [ "/data/lwm-data/resized_without_padding_images/9e7e2a0e3102aaca2cfba8ffe1182de4.png" ], "conversations": [ { "from": "user", "value": "在,您能通过提供边界框将我引导到“金融服务指南(PDF)”的位置吗?" }, { "from": "assistant", "value": "边界框是 [729, 0, 810, 25]" } ] },

这个是归一化的0-1000之间的值吗

WeiminLee commented 3 weeks ago

参考下面的: { "id": "20688", "image": [ "/data/lwm-data/resized_without_padding_images/9e7e2a0e3102aaca2cfba8ffe1182de4.png" ], "conversations": [ { "from": "user", "value": "在,您能通过提供边界框将我引导到“金融服务指南(PDF)”的位置吗?" }, { "from": "assistant", "value": "边界框是 [729, 0, 810, 25]" } ] },

这个是归一化的0-1000之间的值吗

是的