Allen0307 / AdapterBias

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
18 stars 0 forks source link

Question about AdapterBias Visualizations in the Paper #2

Closed ThomasFG closed 1 year ago

ThomasFG commented 1 year ago

Hey,

I have noticed that figure 2 and figure 3 in the referenced paper "AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks" show conflicting information about the implementation of adapterbias. In figure 2 the residual is taken after the second feed-forward layer per block, whereas in figure 3 the residual is taken directly after the attention-layer-normalization.

Which of these figures most accurately describes the architecture of AdapterBias?

Allen0307 commented 1 year ago

Hi, Our AdapterBias is added after the second feed-forward layer per block (like figure 2). I am sorry that our figure 3 was wrong about the residual. The initial purpose of figure 3 is describing how we compute bias in AdapterBias. Thanks for finding the error.