BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.72k stars 868 forks source link

Create RWKV language model from config, not loading from file, without CUDA #89

Open James4Ever0 opened 1 year ago

James4Ever0 commented 1 year ago

I saw some code under RWKV-LM/RWKV-v4neo/src/model.py which requires CUDA to create RWKV model.

I want to change the code by replacing the first embedding layer with a linear layer to fit my needs.

The code of rwkv.model.RWKV only allows me to load from existing model weights.

I want to know where or how I can create a new RWKV model from config, not from existing model weights, also how do I change the first layer of the model?

lantudou commented 1 year ago

Maybe I didn't got your thoughts, but in train.py, the default config is creating new RWKV model rather than loading from existing model.

To training using your own dataset, You can just start with here

python train.py --load_model "" --wandb "" --proj_dir "out" \
     --data_file "./enwik8" --data_type "utf-8" --vocab_size 0 \
     --ctx_len 512 --epoch_steps 5000 --epoch_count 500 --epoch_begin 0 --epoch_save 5 \
     --micro_bsz 12 --n_layer 6 --n_embd 512 --pre_ffn 0 --head_qk 0 \
 --lr_init 8e-4 --lr_final 1e-5 --warmup_steps 0 --beta1 0.9 --beta2 0.99 --adam_eps 1e-8 \
     --accelerator gpu --devices 1 --precision bf16 --strategy ddp_find_unused_parameters_false --grad_cp 0