OpenMOSE / RWKV5-LM-LoRA

RWKV v5,v6 LoRA Trainer on Cuda and Rocm Platform. RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
11 stars 3 forks source link

断点重新训练-学习率问题(bug) #3

Open xinyinan9527 opened 6 months ago

xinyinan9527 commented 6 months ago

断点之后,重新训练模型, 例如:之前训练了200个epoch,lora模型与原模型合并之后,将epoch_begin设置为201,重新训练。学习率会出现断崖式下降,epoch-200的学习率>>>epoch-201的学习率 将src/trainer.py line 35修改为 real_step = trainer.global_step + args.epoch_begin * args.epoch_steps/args.real_bsz 即可使学习率变正常