Closed sayakpaul closed 2 years ago
Why is there an additional of 1 to v in the gating blocks?
v
An example:
https://github.com/google-research/maxim/blob/ae458a6f2be191b3b6a619feac5739ccc3ae6975/maxim/models/maxim.py#L149
@vztu
It's like adding a skip connection: x = u * v + u. Actually doesn't matter that much in terms of performance.
x = u * v + u
Ah. Didn't even notice that. Pardon my stupidity.
Why is there an additional of 1 to
v
in the gating blocks?An example:
https://github.com/google-research/maxim/blob/ae458a6f2be191b3b6a619feac5739ccc3ae6975/maxim/models/maxim.py#L149
@vztu