Closed arxyzan closed 2 years ago
Alright, after playing around with the code and reading the paper more carefully I have figured out some of the answers which I put below for anyone wondering:
masked_lm
) in which the src_tokens are masked version of the inputs and target_tokens are the original unmasked inputs. but in audio the original input is fed to the model and masking is done within the forward method.fairseq.criterions.model_criterion.ModelCriterion
which relies on the model to provide the losses and the reason to not use fairseq.criterions.masked_lm.MaskedLmLoss
like in RoBERTa is that data2vec uses either MSE or L1 loss instead of cross entropy loss provided in fairseq.modules.cross_entropy
so the losses are better to be calculated inside the forward method of Data2Vec.
First off, Thank you @alexeib for this great work on data2vec. I'm currently trying to impelement data2vec model in pure PyTorch. I've read the paper and codes and I have a couple of questions:
target_tokens
argument in the forward method ofData2VecTextModel
andData2VecTextEncoder
?why mask representations?
x = x[masked_indices] y = y[masked_indices] x = self.regression_head(x) loss = self.criterion(x, y) ...
Thanks in advance for any help or recommendations. Best, Aryan