Closed lhcalibur closed 4 years ago
I also don't know the exact reason. Hope maybe someone could give an explanation. I guess it may be an approximation of initialize the memory bank with unit random vectors. The original paper InstDiscrimination's implementation uses exactly also this way to initialize memory bank.
Hi, @lhcalibur,
Good question!
Suppose you sample each dimension x
uniformly from [-a, a]
, you want inputSize * E[x^2] = 1
, i.e., the expectation of the norm of the random vector is one. Then the computation of a
follows simple derivation and you will obtain that a = sqrt(3 / inputSize)
.
Why initialize memory value this way? Thanks!