BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.32k stars 838 forks source link

neo-4 training from scratch no results after 1000 epochs #110

Closed bello7777 closed 1 year ago

bello7777 commented 1 year ago

hey can you pls help

I'am training from scratch my own model , bur after 1000 epochs

Bob: hello Alice:8/!Uhldr!Nfyhmd!)Boqhk!2:58-!mc!#Ugd!Tsqmfd!Drd!ne!Xhkgdkl!Sdhbg-#!Ugd!Odv!Sdotakhb!)!Nx!37-!2:58-!ansg!ax!Nhkcqdc!Fchd !Cq`cx/!Qtqonqshmf!sn!ad!naidbshud!cdrbqhoBob:

this is the training I'm using a text file filled with 15 books

python train.py --load_model "" --wandb "" --proj_dir "out" --data_file "testonfinal.txt" --data_type "utf-8" --vocab_size 50277 --ctx_len 128 --epoch_steps 1000 --epoch_count 1000 --epoch_begin 0 --epoch_save 25 --micro_bsz 16 --n_layer 12 --n_embd 768 --pre_ffn 0 --head_qk 0 --lr_init 6e-4 --lr_final 1e-5 --warmup_steps 0 --beta1 0.9 --beta2 0.99 --adam_eps 1e-8 --accelerator gpu --devices 1 --precision fp32 --strategy ddp_find_unused_parameters_false --grad_cp 0

BlinkDL commented 1 year ago

You are using a wrong tokenizer Use https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v4/run.py to run it