arxyzan / data2vec-pytorch

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI
MIT License
172 stars 26 forks source link

Question about reproducibility #7

Closed daisukelab closed 2 years ago

daisukelab commented 2 years ago

Hello, thanks for your effort to make it easier to understand the data2vec. Let me ask a quick question; can we reproduce the paper with your implementation? I guess it is out of the scope of this repo, but I thought it would be quite nice if possible. Thank you anyway!

arxyzan commented 2 years ago

Hello Daisuke, I'm so glad this repo has been useful! Actually, I attempted to copy the weights from HuggingFace version (which is copied from fairseq itself!) into the models in this repo. Although I highly recommend you use the models in HuggingFace, there still is the option to load the exact same weights as the original paper. You can find out more in the README.md.

daisukelab commented 2 years ago

Thanks for your comment. I understood that pretraining can be done, but has not been done for reproducing paper; we are to use weights from Hugging Face. Thanks again! :)