facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
29.74k stars 6.3k forks source link

mBART Continued Pretraining #4421

Open rohandas14 opened 2 years ago

rohandas14 commented 2 years ago

Hello!

I am trying to perform continued pretraining on the mbart.cc.25 pretrained checkpoint using the multilingual denoising objective. However, I am not sure how to prepare and pre-process the data for the continued pretraining step. It would be great if someone could point me to what the pre-processing script should look like.

Thanks!

rohandas14 commented 2 years ago

@ngoyal2707 and/or @myleott , any pointers would be really helpful!

BramVanroy commented 1 year ago

Same question here. Pretraining code seems to be sorely missing. Any help would be great with that respect.

alimrsn79 commented 5 months ago

Has this been addressed since then? I'm dealing with the exact same problem here :(

tarudesu commented 3 months ago

Could you address this issue yet? @alimrsn79 @BramVanroy