Hey guys,
Thanks a lot for your great job! I am particularly interested in adapting Lonformer Attention mechanism for mBART and then fine-tuning with Transformers lib. I thought it would be no problem, I just replaced BartTokenizer and BartForConditionalGeneration with MBartTokenizer and MBartForConditionalGeneration.
The conversion with convert_bart_to_longformerencoderdecoder.py seems fine, but still, when trying to fine-tune with finetune-trainer.pyof Transformers, I get RuntimeError: Error(s) in loading state_dict for BartForConditionalGeneration: size mismatch for model.encoder.embed_positions.weight: copying a param with shape torch.Size([4098, 1024]) from checkpoint, the shape in current model is torch.Size([1026, 1024]).
At the same time, everything works for models like those ones: https://drive.google.com/drive/folders/10gPiqlAdIART4cMNWhI1fIVZ3J1L5hAW, and they seem to have been done with the same script. Could you guys give me a hint how that might work for Longformer + MBART + fine-tuning with Transformers ?
Hey guys, Thanks a lot for your great job! I am particularly interested in adapting Lonformer Attention mechanism for mBART and then fine-tuning with Transformers lib. I thought it would be no problem, I just replaced
BartTokenizer
andBartForConditionalGeneration
withMBartTokenizer
andMBartForConditionalGeneration
.The conversion with
convert_bart_to_longformerencoderdecoder.py
seems fine, but still, when trying to fine-tune withfinetune-trainer.py
of Transformers, I getRuntimeError: Error(s) in loading state_dict for BartForConditionalGeneration: size mismatch for model.encoder.embed_positions.weight: copying a param with shape torch.Size([4098, 1024]) from checkpoint, the shape in current model is torch.Size([1026, 1024]).
At the same time, everything works for models like those ones: https://drive.google.com/drive/folders/10gPiqlAdIART4cMNWhI1fIVZ3J1L5hAW, and they seem to have been done with the same script. Could you guys give me a hint how that might work for Longformer + MBART + fine-tuning with Transformers ?