Closed hangzeli08 closed 1 year ago
Would you mind sharing the data file you used for training? The OwlEval is only used for evaluation not for training.
我用了 1.owleval转化为你们规定格式的数据 2.llava_instruct_150k.json转化为你们规定格式的数据 3.detail_23k.json转化为你们规定格式的数据 如果只用OwlEval其中一份数据,loss不为nan,合并两个数据,loss马上为nan。
{"image": ["/data/data3/hangzeli/work_lab/workspace/multimodal/mPLUG-Owl/OwlEval/cases/1.jpg"], "text": "The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman:
detail_23k_instruct_image82.txt
llava_instruct_150k_image2_struct5b.txt
llava_instruct_150k_image2_struct5b.txt
minigpt4_13b_answer_mplug11.txt
这些都是 loss为nan的数据。应该是jsonl结尾的,但是github不支持这种文件格式上传,所以改成了txt
Your data are totally incorrect.
First, you did not provide <image>
token for image input placeholder.
Meanwhile, you should use the following prompt
The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.\nHuman: <image>\nHuman: {YOUR PROMPT}\nAI: {YOUR RESPONSE}
.
You should add a space between "AI:" and "Response"!
Please refer to https://github.com/X-PLUG/mPLUG-Owl/blob/main/pipeline/data_utils/xgpt3_dataset.py#L141
It is not problem related to the code.
Please read the dataset to know how data will be process before fine-tuning your model. @hangzeli08
好的,我不应该过早下结论,我试一下
it works! 好使了,loss不为nan了,loss是1.99 并逐渐减少。的确是AI那里少了个空格。我当时其实把xgpt3_dataset.py 那里的数据变量打印出来了,当时竟然没觉得哪里不对。这个bug调试了好几天,不应该妄下结论。终于work了。thank you。
After read this issue, I have a insterest question: why the loss will be nan without a space between "AI:" and "Response"
OwlEval里面的数据集82条数据,微调训练一下,loss不为nan,把这82条复制一下,变成164条,loss为nan,在82条基础上,加上几条短的数据,马上变成nan,这次还要说是长度的问题吗?很明显和长度无关,请修复你们的模型,问题现在看来的确存在,是模型的问题。