antoyang / FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
https://arxiv.org/abs/2206.08155
Apache License 2.0
153 stars 23 forks source link

About the stage of pre-training #12

Closed WADE1121 closed 1 year ago

WADE1121 commented 1 year ago

Hi, I noticed that you only trained on webvid10m for two epochs, is it converging already?

antoyang commented 1 year ago

Hi, if I remember correctly, the loss was plateauing after the second epoch.

WADE1121 commented 1 year ago
class DebertaV2OnlyMLMHead(nn.Module):
    def __init__(self, config):
        super().__init__()
        # self.predictions = DebertaV2LMPredictionHead(config)
        self.lm_head = DebertaV2LMPredictionHead(config)

    def forward(self, sequence_output, embedding_weight, bias=None):
        prediction_scores = self.lm_head(sequence_output, embedding_weight, bias=bias)
        return prediction_scores

Hi, please correct me if I'm wrong. I noticed you renamed the head "self.predictions" to "self.lm_head". It looks like you didn't use the pretrained parameters for the debertav2 head "DebertaV2OnlyMLMHead" from hugginface. Instead, you defined and initialized a new head called "lm_head" and froze it during pretraining

antoyang commented 1 year ago

This is not correct. This modification actually ensured the deberta weights are properly loaded.