QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Other
4.38k stars 332 forks source link

请问微调中使用本地数据时配置json格式的文件中的id参数每个样本可以设置为一样的吗? #341

Open whysirier opened 3 months ago

whysirier commented 3 months ago

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

请问微调中使用本地数据时配置json格式的文件中的id参数每个样本可以设置为一样的吗?

基本示例 | Basic Example

1)id一样: [ { "id": "identity_0", "conversations": [ { "from": "user", "value": "你好" }, { "from": "assistant", "value": "我是VL。" } ] }, { "id": "identity_0", "conversations": [ { "from": "user", "value": "Picture 1: /mnt/data/images/cat217.jpg\n图中有几只猫?" } ] } ] 2)id不一样 [ { "id": "identity_0", "conversations": [ { "from": "user", "value": "你好" }, { "from": "assistant", "value": "我是VL。" } ] }, { "id": "identity_1", "conversations": [ { "from": "user", "value": "Picture 1: /mnt/data/images/cat217.jpg\n图中有几只猫?" } ] } ]

缺陷 | Drawbacks

两种方式对训练效果有影响吗

未解决问题 | Unresolved questions

No response

AIFFFENG commented 3 months ago

应该不影响,之前考虑过这个问题,查找源代码的过程中并未发现该id被使用