The loss of the base model during training does not decrease

shimurenhlq commented 8 months ago

Hello, I endeavor to replicate the results of the base model using the "declare-lab/flan-alpaca-base" obtained from Hugging Face. I followed the commands provided in the readme for training; however, the loss does not exhibit a descent pattern, and, regrettably, the inference fails to produce any meaningful content. Below, I present a partial excerpt from my trainer_state for your reference： { "epoch": 0.02, "learning_rate": 3.135779241141424e-06, "loss": 17.987, "step": 500 }, { "epoch": 0.03, "learning_rate": 6.271558482282848e-06, "loss": 17.9571, "step": 1000 }, …… { "epoch": 9.99, "learning_rate": 1.320328101533231e-07, "loss": 16.2255, "step": 318500 }, { "epoch": 10.0, "eval_gen_len": 1.0, "eval_loss": 17.40145492553711, "eval_rouge1": 0.007, "eval_rouge2": 0.0, "eval_rougeL": 0.0069, "eval_rougeLsum": 0.007, "eval_runtime": 411.3956, "eval_samples_per_second": 21.349, "eval_steps_per_second": 0.168, "step": 318900 } When attempting to conduct inference using the acquired model, the generated content proves entirely ineffective： '- nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> '

What are the reasons for the above problems? Looking forward to your answer, thank you！

Pendulumclock commented 5 months ago

I encountered the same problem as you, and finally found out that it was caused by a fast_init function when loading the model， which accelerates the model initialization, but will cause some bugs. Turning it off will solve the problem.

lim4349 commented 4 months ago

Check your grad_norm value, if it`s nan of inf, turn it off. Change your LR 1e-4 or 2e-4 or 1e-5. this worked for me.

xukefaker commented 2 months ago

Check your grad_norm value, if it`s nan of inf, turn it off. Change your LR 1e-4 or 2e-4 or 1e-5. this worked for me.

hi, how can I turn off the grad_norm?

cooelf / Auto-GUI

The loss of the base model during training does not decrease #10