code implementation: multiplication of 2 in the final layer output

Crane-YU commented 4 years ago

Hi @YimianDai , thanks for sharing your work and code. Just want to quick check the reason why you multiply 2 at the end of module block. Does it help you train the model or is it a normalization parameter?

YimianDai commented 4 years ago

I believe it has no impact on the training. The reason I use the multiplication of 2 is that I want to keep the total weights the same as addition.

In the direct addition case, X + Y is actually 1 X + 1 Y, the sum of the weight is 2. However, in a soft selection way, M(X+Y) X + (1 - M(X+Y)) Y, the sum of the weight is 1, so I multiply 2 to keep them the same. Then the only difference between 1 X + 1 Y and 2 M(X+Y) X + 2 (1 - M(X+Y)) Y is the dynamic weight allocation, but the sum of the weights keeps the same.

Crane-YU commented 4 years ago

@YimianDai Thank you

YimianDai / open-aff

code implementation: multiplication of 2 in the final layer output #3