Open rabeehkarimimahabadi opened 3 years ago
For different architectures you'll need to experiment a little with where to put the adapters. The first thing one could try is placing them before the layer norm layers. The second could be at the end, after dropout, and possibly with an additional layer norm afterwards.
Hi, thank you for the response. I sometimes get huge drop in performance with unfreezing the model's layernorm, could you give me some intuitions how adapter layers interact with layernorms? thanks
Hi, I am having a model in which normalization first happens and then there is add operation. In the paper, you discussed the post-norm case, could you tell me how I can implement adapters for this case? thank you.
I mark the lines normalization and add operation happen with ** to describe the model better: