Allen0307 / AdapterBias

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
18 stars 0 forks source link

Question about Freezing of Final Layer-Normalization #3

Closed ThomasFG closed 1 year ago

ThomasFG commented 1 year ago

Hey,

I have a question regarding freezing in AdapterBias. In the referenced paper "AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks" it is said, that "Assume that given a PLM with parameters θ and AdapterBias with parameters θ′. During the training stage, we freeze θ and tune θ′ only." However, the model initialization indicates that the final layer normalization is not frozen either (see code blow). This would not be surprising since Houlsby Adapters do the same, yet from my understanding it does not match the description from the paper. Which would be the correct way for me to add AdapterBias to my model?

for name,param in self.backbond.named_parameters(): 
    if 'LayerNorm' in name and 'attention' not in name:
        self.param_lst.append(param)
        continue
    elif 'adapter' in name:
        if 'bias' in name:
            self.param_lst.append(param)                
        else:
            self.weight_lst.append(param)
        continue
    else:
        param.requires_grad = False
Allen0307 commented 1 year ago

Hi, we also train the layer normalization during training, just like Houlsby. We describe it in exp setting. "The training details are shown in Appendix A.3. Note that the second layer normalization in each transformer layer is also tuned during the training stage, corresponding to the orange component in the right part of Figure 2. "

ThomasFG commented 1 year ago

I assume you meant to say you also tune the layer normalization during training 😄 Thank you very much for the quick replies!