allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2.05k stars 276 forks source link

Longformer model with weight(model.encoder.embed_positions.weight) error #186

Open BinchaoPeng opened 3 years ago

BinchaoPeng commented 3 years ago
RuntimeError: Error(s) in loading state_dict for BartModel:
    size mismatch for model.encoder.embed_positions.weight: copying a param with shape torch.Size([16386, 768]) from checkpoint, the shape in current model is torch.Size([1026, 768]).

I use longformer model called longformer-encdec-base-16384 which is downloaded in https://github.com/allenai/longformer,and use huggingface to load the model,when transformers’ version is 3.1.0, the code can run, but when it is 4.4.2,the error happened.

MeanWhile,when I use the model to proposal pairs of sentences,I found it that the returned token_type_ids values are just zero without one. how ever,in the model's special_tokens_map.json, it has defined cls_token and sep_token.

Finally, I sincerely hope you would reply me soon. Thanks!

EmilyAlsentzer commented 3 years ago

I had this issue when doing led = AutoModelForSeq2SeqLM.from_pretrained(hparams['model'], gradient_checkpointing=True, use_cache=False) but it was resolved when I tried led = LEDForConditionalGeneration.from_pretrained(hparams['model'], gradient_checkpointing=True, use_cache=False)

(Note that I get a different bug when using the latter, which I haven't resolved yet - Can't set attention_probs_dropout_prob with value 0.1 for LEDConfig)

edgartanaka commented 3 years ago

Any luck solving this @EmilyAlsentzer ? Going through the same issue.

MorenoLaQuatra commented 2 years ago

Same issue for me, I think the repo should be adapted to work with the new version of HF transformers