BlinkDL / RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Apache License 2.0
12.05k stars 827 forks source link

微调报错,RuntimeError: Error(s) in loading state_dict for RWKV, size mismatch for ..... #172

Closed DDYuudachi closed 11 months ago

DDYuudachi commented 11 months ago

你好,微调rwkv-4-World和Raven 7B的时候有报错,RuntimeError: Error(s) in loading state_dict for RWKV, size mismatch for ..... 见下面报错信息 报错字段信息 size mismatch for blocks.0 会从0一直循环到23,这里就截了0到1的,请问下我是哪里配置错了

训练脚本bash python train.py \ --load_model "/media/asus/DATA/pc/Rwkv/rwkv-4-world/RWKV-4-World-CHNtuned-7B-v1-20230709-ctx4096.pth" \ --proj_dir "out" \ --data_file "/media/asus/DATA/pc/finetune/json2binidx_tool-main/data/sample_text_document" \ --data_type "binidx" \ --vocab_size 50277 \ --ctx_len 1024 \ --epoch_steps 1000 \ --epoch_count 1000 \ --epoch_begin 0 \ --epoch_save 5 \ --micro_bsz 2 \ --accumulate_grad_batches 4 \ --n_layer 24 \ --n_embd 1024 \ --pre_ffn 0 \ --head_qk 0 \ --lr_init 1e-4 \ --lr_final 1e-4 \ --warmup_steps 0 \ --beta1 0.9 \ --beta2 0.999 \ --adam_eps 1e-8 \ --accelerator gpu \ --devices 4 \ --precision bf16 \ --strategy deepspeed_stage_2 \ --grad_cp 0 \ --lora \ --lora_r 8 \ --lora_alpha 16 \ --lora_dropout 0.01 \ --lora_parts=att,ffn,time,ln # configure which parts to finetune

报错信息 File "/media/asus/DATA/pc/finetune/RWKV-LM-LoRA-main/RWKV-v4neo/train.py", line 346, in model.load_state_dict(load_dict, strict=(not args.lora)) File "/home/asus/anaconda3/envs/rwkv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for RWKV: size mismatch for emb.weight: copying a param with shape torch.Size([65536, 4096]) from checkpoint, the shape in current model is torch.Size([50277, 1024]). size mismatch for blocks.0.ln1.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.0.ln1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.0.ln2.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.0.ln2.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.0.ln0.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.0.ln0.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.0.att.time_decay: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.0.att.time_first: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.0.att.time_mix_k: copying a param with shape torch.Size([1, 1, 4096]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]). size mismatch for blocks.0.att.time_mix_v: copying a param with shape torch.Size([1, 1, 4096]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]). size mismatch for blocks.0.att.time_mix_r: copying a param with shape torch.Size([1, 1, 4096]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]). size mismatch for blocks.0.att.key.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for blocks.0.att.value.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for blocks.0.att.receptance.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for blocks.0.att.output.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for blocks.0.ffn.time_mix_k: copying a param with shape torch.Size([1, 1, 4096]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]). size mismatch for blocks.0.ffn.time_mix_r: copying a param with shape torch.Size([1, 1, 4096]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]). size mismatch for blocks.0.ffn.key.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for blocks.0.ffn.receptance.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for blocks.0.ffn.value.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([1024, 4096]). size mismatch for blocks.1.ln1.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.1.ln1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.1.ln2.weight: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.1.ln2.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.1.att.time_decay: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.1.att.time_first: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for blocks.1.att.time_mix_k: copying a param with shape torch.Size([1, 1, 4096]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]). size mismatch for blocks.1.att.time_mix_v: copying a param with shape torch.Size([1, 1, 4096]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]). size mismatch for blocks.1.att.time_mix_r: copying a param with shape torch.Size([1, 1, 4096]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]). size mismatch for blocks.1.att.key.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for blocks.1.att.value.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for blocks.1.att.receptance.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for blocks.1.att.output.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for blocks.1.ffn.time_mix_k: copying a param with shape torch.Size([1, 1, 4096]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]). size mismatch for blocks.1.ffn.time_mix_r: copying a param with shape torch.Size([1, 1, 4096]) from checkpoint, the shape in current model is torch.Size([1, 1, 1024]). size mismatch for blocks.1.ffn.key.weight: copying a param with shape torch.Size([16384, 4096]) from checkpoint, the shape in current model is torch.Size([4096, 1024]). size mismatch for blocks.1.ffn.receptance.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([1024, 1024]). size mismatch for blocks.1.ffn.value.weight: copying a param with shape torch.Size([4096, 16384]) from checkpoint, the shape in current model is torch.Size([1024, 4096]).

dp543831577 commented 6 months ago

请问这个如何解决的?

saidi1ai commented 6 months ago

抱歉,我不清楚如何解决这个问题。

Y.Zairi - Chat @ Spikehttps://spikenow.com/r/a/?ref=spike-organic-signature&_ts=2ftx3m [2ftx3m]

On January 26, 2024 at 2:24 GMT, Donald @.**@.>> wrote:

请问这个如何解决的?

— Reply to this email directly, view it on GitHubhttps://github.com/BlinkDL/RWKV-LM/issues/172#issuecomment-1911308744, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A7OHYHEE47D7XVTBN2AIXL3YQMHVNAVCNFSM6AAAAAA3PYBMGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJRGMYDQNZUGQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

[https://bolt.im/t/?TDr47uMZnrWi3-tzz-hbS8e38FaDbfYz0_1sRQyELkjlGy7aos41QDdoakkJHHnJa-DLOnfZ6J-XNmXF5Eolww]