Create RWKV language model from config, not loading from file, without CUDA

BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Apache License 2.0

12.32k stars 838 forks source link

Maybe I didn't got your thoughts, but in train.py, the default config is creating new RWKV model rather than loading from existing model.

To training using your own dataset, You can just start with here

python train.py --load_model "" --wandb "" --proj_dir "out" \
     --data_file "./enwik8" --data_type "utf-8" --vocab_size 0 \
     --ctx_len 512 --epoch_steps 5000 --epoch_count 500 --epoch_begin 0 --epoch_save 5 \
     --micro_bsz 12 --n_layer 6 --n_embd 512 --pre_ffn 0 --head_qk 0 \
 --lr_init 8e-4 --lr_final 1e-5 --warmup_steps 0 --beta1 0.9 --beta2 0.99 --adam_eps 1e-8 \
     --accelerator gpu --devices 1 --precision bf16 --strategy ddp_find_unused_parameters_false --grad_cp 0

BlinkDL / RWKV-LM

Create RWKV language model from config, not loading from file, without CUDA #89