OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
12.78k stars 893 forks source link

[BUG] finetune/dataset.py | TypeError:无法根据规则“same_kind”将数组数据从 dtype('float64') 转换为 dtype('int32') #167

Closed rover5056 closed 6 months ago

rover5056 commented 6 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

dataset.py

ids = torch.from_numpy(np.hstack(input_ids, dtype=np.int32)) 这里直接转换类型会报错

手动在 input_ids.append(prefix_ids) 之前把 prefix_ids 转一下就可以。。 prefix_ids = np.array(prefix_ids,dtype=np.int32) message_ids = np.array(message_ids,dtype=np.int32) 辛苦看看是不是 bug

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

YuzaChongyi commented 6 months ago

之前有一个类似的 https://github.com/OpenBMB/MiniCPM-V/issues/113, 是训练数据出现了空消息,这个时候 input_ids, 中会包含一个 [] 导致该报错,你可以检查一下训练数据是否出现了空的字段。