Open ArchanaNarayanan843 opened 7 months ago
--n_layer 32 --n_embd 2560 for 3B --n_layer 24 --n_embd 2048 for 1.5B --n_layer 24 --n_embd 1024 for 0.4B --n_layer 12 --n_embd 768 for 0.1B
For finetuning, when your bsz is very small, I suggest 1e-5 for 3B, 1.5e-5 for 1.5B, 2e-5 for 0.4B, 3e-5 for 0.1B.
How to train RWKV-5-World-1B5-v2 model