allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2.05k stars 276 forks source link

MBART into LongMBART #172

Closed Dmitriuso closed 3 years ago

Dmitriuso commented 3 years ago

Hey guys, Thanks a lot for your great job! I am particularly interested in adapting Lonformer Attention mechanism for mBART and then fine-tuning with Transformers lib. I thought it would be no problem, I just replaced BartTokenizer and BartForConditionalGeneration with MBartTokenizer and MBartForConditionalGeneration.

The conversion with convert_bart_to_longformerencoderdecoder.py seems fine, but still, when trying to fine-tune with finetune-trainer.pyof Transformers, I get RuntimeError: Error(s) in loading state_dict for BartForConditionalGeneration: size mismatch for model.encoder.embed_positions.weight: copying a param with shape torch.Size([4098, 1024]) from checkpoint, the shape in current model is torch.Size([1026, 1024]).

At the same time, everything works for models like those ones: https://drive.google.com/drive/folders/10gPiqlAdIART4cMNWhI1fIVZ3J1L5hAW, and they seem to have been done with the same script. Could you guys give me a hint how that might work for Longformer + MBART + fine-tuning with Transformers ?