BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
11.99k stars 825 forks source link

能否提供huggingface 上的全部RWKV v5模型的微调参数? #226

Closed lantudou closed 4 months ago

lantudou commented 4 months ago

非常好的工作,但readme里只提到了如何微调7B模型:“训练命令更改为使用--n_layer 32 --n_embd 4096 --vocab_size 65536 --lr_init 1 e-5 --lr_final 1 e-5(用于7 B)。”

能否提供3B,1.5B,0.4B, 0.1B的微调参数?

BlinkDL commented 4 months ago

--n_layer 32 --n_embd 2560 for 3B --n_layer 24 --n_embd 2048 for 1.5B --n_layer 24 --n_embd 1024 for 0.4B --n_layer 12 --n_embd 768 for 0.1B

For finetuning, when your bsz is very small, I suggest 1e-5 for 3B, 1.5e-5 for 1.5B, 2e-5 for 0.4B, 3e-5 for 0.1B.