You compute the mean and standard-deviation of the parameter once, and cache them. But it is said in the paper that not only the important parameters are updated, but also the ones corresponding to zero entries of masks. This means that the distribution of parameters are constantly changing. I also found that you only update the 'important' parameters.
Where does the code reflect the author's special update method of parameters?
You compute the mean and standard-deviation of the parameter once, and cache them. But it is said in the paper that not only the important parameters are updated, but also the ones corresponding to zero entries of masks. This means that the distribution of parameters are constantly changing. I also found that you only update the 'important' parameters. Where does the code reflect the author's special update method of parameters?