Errors when I run generation

AGI-Edgerunners / LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"

https://arxiv.org/abs/2304.01933

Apache License 2.0

1.08k stars 103 forks source link

Errors when I run generation #36

Open ChaoGaoUCR opened 1 year ago

ChaoGaoUCR commented 1 year ago

Dear Authors, Sorry for bothering you. I am hitting errors for all datasets I tried to run the evaluation.

Could you please take a look at this?

Thanks!

HZQ950419 commented 1 year ago

Hi,

According to the error message, one possible reason is that the fine-tuning of the model crashed. Can you check the training loss when you are fine-tuning the model? if the training loss goes to 0 and eval loss goes to nan, it means the training crashed.

ChaoGaoUCR commented 1 year ago

Thank you so much for the fast replying. Actually, I deleted the config part and it passed... I am trying to find out why... I think you are right, the tuned model may have some problems...

HZQ950419 commented 1 year ago

Hi,

According to the issue https://github.com/tloen/alpaca-lora/issues/408, it seems like a CUDA issue. However, I can't reproduce the error from my side. But I found a solution for it by commenting Line 51-53 in evaluate.py.

If you have further questions, please let us know!

ChaoGaoUCR commented 1 year ago

Thank you so much, The problem got resolved!! I downgrade the Cuda to 11.6 and all is resolved!

ZeguanXiao commented 6 months ago

Hi,

According to the error message, one possible reason is that the fine-tuning of the model crashed. Can you check the training loss when you are fine-tuning the model? if the training loss goes to 0 and eval loss goes to nan, it means the training crashed.

How to resolve the crash of the experiment?