Question about Freezing of Final Layer-Normalization

ThomasFG commented 1 year ago

Hey,

I have a question regarding freezing in AdapterBias. In the referenced paper "AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks" it is said, that "Assume that given a PLM with parameters θ and AdapterBias with parameters θ′. During the training stage, we freeze θ and tune θ′ only." However, the model initialization indicates that the final layer normalization is not frozen either (see code blow). This would not be surprising since Houlsby Adapters do the same, yet from my understanding it does not match the description from the paper. Which would be the correct way for me to add AdapterBias to my model?

for name,param in self.backbond.named_parameters(): 
    if 'LayerNorm' in name and 'attention' not in name:
        self.param_lst.append(param)
        continue
    elif 'adapter' in name:
        if 'bias' in name:
            self.param_lst.append(param)                
        else:
            self.weight_lst.append(param)
        continue
    else:
        param.requires_grad = False

Allen0307 commented 1 year ago

Hi, we also train the layer normalization during training, just like Houlsby. We describe it in exp setting. "The training details are shown in Appendix A.3. Note that the second layer normalization in each transformer layer is also tuned during the training stage, corresponding to the orange component in the right part of Figure 2. "

ThomasFG commented 1 year ago

I assume you meant to say you also tune the layer normalization during training 😄 Thank you very much for the quick replies!

Allen0307 / AdapterBias

Question about Freezing of Final Layer-Normalization #3