allenai / RL4LMs

A modular RL library to fine-tune language models to human preferences
https://rl4lms.apps.allenai.org/
Apache License 2.0
2.21k stars 191 forks source link

Bug while loading t5 base model #53

Open Sahajtomar opened 1 year ago

Sahajtomar commented 1 year ago

I am trying to load t5 base model as per t5_ppo config. Strangely this error pops out. Works fine for t5-small.

    size mismatch for decoder.final_layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for lm_head.weight: copying a param with shape torch.Size([32128, 512]) from checkpoint, the shape in current model is torch.Size([32128, 768]).
Runingtime commented 1 year ago

I got the same error, any workarounds?

RuntimeError: Error(s) in loading state_dict for T5ForConditionalGeneration: size mismatch for shared.weight: copying a param with shape torch.Size([32100, 768]) from checkpoint, the shape in current model is torch.Size([32128, 768]). size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([32100, 768]) from checkpoint, the shape in current model is torch.Size([32128, 768]). size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([32100, 768]) from checkpoint, the shape in current model is torch.Size([32128, 768]). size mismatch for lm_head.weight: copying a param with shape torch.Size([32100, 768]) from checkpoint, the shape in current model is torch.Size([32128, 768]).