Open ChaoGaoUCR opened 1 year ago
Hi,
According to the error message, one possible reason is that the fine-tuning of the model crashed. Can you check the training loss when you are fine-tuning the model? if the training loss goes to 0 and eval loss goes to nan, it means the training crashed.
Thank you so much for the fast replying. Actually, I deleted the config part and it passed... I am trying to find out why... I think you are right, the tuned model may have some problems...
Hi,
According to the issue https://github.com/tloen/alpaca-lora/issues/408, it seems like a CUDA issue. However, I can't reproduce the error from my side. But I found a solution for it by commenting Line 51-53 in evaluate.py.
If you have further questions, please let us know!
Thank you so much, The problem got resolved!! I downgrade the Cuda to 11.6 and all is resolved!
Hi,
According to the error message, one possible reason is that the fine-tuning of the model crashed. Can you check the training loss when you are fine-tuning the model? if the training loss goes to 0 and eval loss goes to nan, it means the training crashed.
How to resolve the crash of the experiment?
Dear Authors, Sorry for bothering you. I am hitting errors for all datasets I tried to run the evaluation.
Could you please take a look at this?
Thanks!