Closed ThomasFG closed 1 year ago
Hi, Our AdapterBias is added after the second feed-forward layer per block (like figure 2). I am sorry that our figure 3 was wrong about the residual. The initial purpose of figure 3 is describing how we compute bias in AdapterBias. Thanks for finding the error.
Hey,
I have noticed that figure 2 and figure 3 in the referenced paper "AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks" show conflicting information about the implementation of adapterbias. In figure 2 the residual is taken after the second feed-forward layer per block, whereas in figure 3 the residual is taken directly after the attention-layer-normalization.
Which of these figures most accurately describes the architecture of AdapterBias?