Open anmolagarwal999 opened 10 months ago
i am getting same issue
I recently discovered that Llama1 was pretrained using fp16, but the llama2 family of models were pretrained with bf16. The readme in this repo has fp16 set as default. Switching to bf16 fixed this.
I followed all the setup instruction given in the README. The command I am using is:
Initially, I got the following error:
I downgraded to transformers version 4.29.2 as suggested here.
Now, training is happening but the learning rate from the beginning itself is fixed to zero. Below are the logs:
Does anyone have any idea on what I might be doing wrong ?