Longformer model with weight(model.encoder.embed_positions.weight) error

BinchaoPeng commented 3 years ago

RuntimeError: Error(s) in loading state_dict for BartModel:
    size mismatch for model.encoder.embed_positions.weight: copying a param with shape torch.Size([16386, 768]) from checkpoint, the shape in current model is torch.Size([1026, 768]).

I use longformer model called longformer-encdec-base-16384 which is downloaded in https://github.com/allenai/longformer，and use huggingface to load the model，when transformers’ version is 3.1.0， the code can run, but when it is 4.4.2，the error happened.

MeanWhile,when I use the model to proposal pairs of sentences，I found it that the returned token_type_ids values are just zero without one. how ever,in the model's special_tokens_map.json, it has defined cls_token and sep_token.

Finally, I sincerely hope you would reply me soon. Thanks！

EmilyAlsentzer commented 3 years ago

I had this issue when doing led = AutoModelForSeq2SeqLM.from_pretrained(hparams['model'], gradient_checkpointing=True, use_cache=False) but it was resolved when I tried led = LEDForConditionalGeneration.from_pretrained(hparams['model'], gradient_checkpointing=True, use_cache=False)

(Note that I get a different bug when using the latter, which I haven't resolved yet - Can't set attention_probs_dropout_prob with value 0.1 for LEDConfig)

edgartanaka commented 3 years ago

Any luck solving this @EmilyAlsentzer ? Going through the same issue.

MorenoLaQuatra commented 2 years ago

Same issue for me, I think the repo should be adapted to work with the new version of HF transformers

allenai / longformer

Longformer model with weight(model.encoder.embed_positions.weight) error #186