freezing "layer_norm" and "head"

google-research / adapter-bert

Apache License 2.0

483 stars 49 forks source link

freezing "layer_norm" and "head" #8

Open rabeehkarimimahabadi opened 3 years ago

rabeehkarimimahabadi commented 3 years ago

Hi Could you confirm in the implementation of adapters, if layer_norm of the original model should be unfreezed? or only layer_norm inside adapter needs to be unfreezed? How about the classifier's head? Does it need to be freezed? thanks

neilhoulsby commented 3 years ago

Unfreeze all the layer norms (shouldn't matter too much) and the classifier head (will likely matter a lot). These correspond to a small number of parameters compared to the adapters.

rabeehkarimimahabadi commented 3 years ago

Hi Neil thanks for the response. when I unfreeze layer norm of the model (not adapter), I get 20 percent decrease in accuracy, I am not sure what is the correct implementation for pre-norm language model cases, for this, I made a separate issue, any suggestion is appreciated. thanks