Closed binwang777 closed 4 months ago
Hello, I'm sorry. I might have introduced some new bugs while organizing the code, and I am currently checking; I will fix it as soon as possible.
Hi, regarding the problem of zero2 training loss being 0.0, I have fixed it here. After replacing the deepspeed version with 0.9.5, the loss converges normally. However, there are still problems with zero3 support, and the reason seems to be here:
Hi, regarding the problem of zero2 training loss being 0.0, I have fixed it here. After replacing the deepspeed version with 0.9.5, the loss converges normally. However, there are still problems with zero3 support, and the reason seems to be here:
OK, I get it. Thanks for your feedback.
In the case of deepspeed zero3, it seems that resizing position encodings is not feasible. I am looking for a solution; I used to pre-resize and save them in advance, but I'm not too fond of doing it that way.
This bug should have been fixed. I have attempted both pre-training and fine-tuning, and the loss curves are normal. Additionally, the results from testing the newly trained models on the benchmarks are also normal.
Hi, I have trained based on vicuna7b using the updated code. This time I only changed the dataset loading and weight loading, But my results are quite different from what you posted, is there still something wrong within the code. I suspect that flash_attn is not version 0.2.8 causing the problem, so I tested the weights you provided and the results are as follows: The results are slightly fluctuating, but still reasonable.
I use the code you provided to train llava based on Intervit6B. According to the script you provided, the first stage of pretrain is running normally. But when using the fine-tuning script for training, I found strange loss transformations.
As shown in the picture: I have not modified any llava code
I debugged EVA_clip_vit and found the problem. It happened in zero2. When replacing it with zero3, the loss was normal. However, Intervit6B has the following problems when using zero3: