RWKV v5,v6 LoRA Trainer on Cuda and Rocm Platform. RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
断点之后,重新训练模型, 例如:之前训练了200个epoch,lora模型与原模型合并之后,将epoch_begin设置为201,重新训练。学习率会出现断崖式下降,epoch-200的学习率>>>epoch-201的学习率 将src/trainer.py line 35修改为
real_step = trainer.global_step + args.epoch_begin * args.epoch_steps/args.real_bsz
即可使学习率变正常