BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
11.99k stars 825 forks source link

如何使用state tuning rwkv6-7B? #246

Open xinyinan9527 opened 1 month ago

xinyinan9527 commented 1 month ago

我按照官网尝试, 应该是只训练time_state,然而报错

RuntimeError: element o of tensors does not require grad and does not have a grad_fn

JL-er commented 4 days ago

请问你是直接使用的RWKV-LM项目,还是自己修改的?如果是自己修改的项目,在冻结梯度是deepspeed的checkpoint会报错,你需要使用torch.checkpoint 详细可以参考RWKV-PEFT