BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.32k stars 838 forks source link

RWKV-v4neo expected scalar type BFloat16 but found Float #99

Closed dataangel closed 1 year ago

dataangel commented 1 year ago

system: windows11+wsl2

lib: torch 2.0.0+cu118 CUDA 118 deepspeed 0.9.1,

python train.py --load_model "" --wandb "" --proj_dir "out" --data_file "./data_text_document" --data_type "binidx" --vocab_size 0 --ctx_len 128 --epoch_steps 1000 --epoch_count 20 --epoch_begin 0 --epoch_save 10 --micro_bsz 16 --n_layer 12 --n_embd 768 --pre_ffn 0 --head_qk 0 --lr_init 6e-4 --lr_final 1e-5 --warmup_steps 0 --beta1 0.9 --beta2 0.99 --adam_eps 1e-8 --accelerator cpu --devices 1 --precision bf16 --strategy ddp_find_unused_parameters_false --grad_cp 0

RUN CPU Out Error

RuntimeError: index_select(): self indexing axis dim should be positive

BlinkDL commented 1 year ago

pip install deepspeed==0.7.0 pip install pytorch-lightning==1.9.2 torch 1.13.1+cu117