断点重新训练-学习率问题（bug）

OpenMOSE / RWKV5-LM-LoRA

RWKV v5,v6 LoRA Trainer on Cuda and Rocm Platform. RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Apache License 2.0

11 stars 3 forks source link

断点重新训练-学习率问题（bug） #3

Open xinyinan9527 opened 6 months ago

xinyinan9527 commented 6 months ago

断点之后，重新训练模型，例如：之前训练了200个epoch，lora模型与原模型合并之后，将epoch_begin设置为201,重新训练。学习率会出现断崖式下降，epoch-200的学习率>>>epoch-201的学习率将src/trainer.py line 35修改为 real_step = trainer.global_step + args.epoch_begin * args.epoch_steps/args.real_bsz 即可使学习率变正常