Closed jyang68sh closed 2 years ago
@jyang68sh First, consider that the identity branch (in yellow) does not have any parameter. If we want to represent this operation (the identity) we can construct a 3x3 kernel with a weight of 1 in the central cell as shown in the figure. The 2X1 kernel that is see is the bias vector.
Why do only two filters out of four have this weight? Because we are considering an example with C_in = 2 and C_out = 2.
@jyang68sh First, consider that the identity branch (in yellow) does not have any parameter. If we want to represent this operation (the identity) we can construct a 3x3 kernel with a weight of 1 in the central cell as shown in the figure. The 2X1 kernel that is see is the bias vector.
Why do only two filters out of four have this weight? Because we are considering an example with C_in = 2 and C_out = 2.
@GiacomoPinardi Hi thanks for the reply.
Sorry for the late response. This solves my question
Hi! Really nice work!
I was trying to understand the re-parameterization part
But what I dont get is how identity of 3 X 3 kernel becomes 2 X 1 in the end. I mean, why does it work without information loss?
Any answer is appreciated