arxyzan / data2vec-pytorch

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI
MIT License
168 stars 26 forks source link

EMA teacher model should not be deepcopied #12

Closed sudhakaranjain closed 1 year ago

sudhakaranjain commented 1 year ago

EMA teacher model, according to the paper, is initialized randomly with the same architecture as student model. So, deepcopying the student model to create the teacher model should be avoided as it copies the weight parameters as well.

arxyzan commented 1 year ago

Hi @sudhakaranjain, I'm not sure if I understand your issue, but according to the official implementation the EMA must be the deepcopy of the student. https://github.com/facebookresearch/fairseq/blob/16538a0bff1b9f32e89aa915f2e8b57193f33109/examples/data2vec/models/data2vec_text.py#L346 https://github.com/facebookresearch/fairseq/blob/16538a0bff1b9f32e89aa915f2e8b57193f33109/fairseq/modules/ema_module.py#L41

sudhakaranjain commented 1 year ago

Sorry for the confusion. You are right!