About the stage of pre-training

WADE1121 commented 1 year ago

Hi, I noticed that you only trained on webvid10m for two epochs, is it converging already?

antoyang commented 1 year ago

Hi, if I remember correctly, the loss was plateauing after the second epoch.

WADE1121 commented 1 year ago

class DebertaV2OnlyMLMHead(nn.Module):
    def __init__(self, config):
        super().__init__()
        # self.predictions = DebertaV2LMPredictionHead(config)
        self.lm_head = DebertaV2LMPredictionHead(config)

    def forward(self, sequence_output, embedding_weight, bias=None):
        prediction_scores = self.lm_head(sequence_output, embedding_weight, bias=bias)
        return prediction_scores

Hi, please correct me if I'm wrong. I noticed you renamed the head "self.predictions" to "self.lm_head". It looks like you didn't use the pretrained parameters for the debertav2 head "DebertaV2OnlyMLMHead" from hugginface. Instead, you defined and initialized a new head called "lm_head" and froze it during pretraining

antoyang commented 1 year ago

This is not correct. This modification actually ensured the deberta weights are properly loaded.

antoyang / FrozenBiLM

About the stage of pre-training #12