Closed WADE1121 closed 1 year ago
Hi, if I remember correctly, the loss was plateauing after the second epoch.
class DebertaV2OnlyMLMHead(nn.Module):
def __init__(self, config):
super().__init__()
# self.predictions = DebertaV2LMPredictionHead(config)
self.lm_head = DebertaV2LMPredictionHead(config)
def forward(self, sequence_output, embedding_weight, bias=None):
prediction_scores = self.lm_head(sequence_output, embedding_weight, bias=bias)
return prediction_scores
Hi, please correct me if I'm wrong. I noticed you renamed the head "self.predictions" to "self.lm_head". It looks like you didn't use the pretrained parameters for the debertav2 head "DebertaV2OnlyMLMHead" from hugginface. Instead, you defined and initialized a new head called "lm_head" and froze it during pretraining
This is not correct. This modification actually ensured the deberta weights are properly loaded.
Hi, I noticed that you only trained on webvid10m for two epochs, is it converging already?