Closed Crane-YU closed 4 years ago
I believe it has no impact on the training. The reason I use the multiplication of 2 is that I want to keep the total weights the same as addition.
In the direct addition case, X + Y is actually 1 X + 1 Y, the sum of the weight is 2. However, in a soft selection way, M(X+Y) X + (1 - M(X+Y)) Y, the sum of the weight is 1, so I multiply 2 to keep them the same. Then the only difference between 1 X + 1 Y and 2 M(X+Y) X + 2 (1 - M(X+Y)) Y is the dynamic weight allocation, but the sum of the weights keeps the same.
@YimianDai Thank you
Hi @YimianDai , thanks for sharing your work and code. Just want to quick check the reason why you multiply 2 at the end of module block. Does it help you train the model or is it a normalization parameter?