Open bkj opened 3 years ago
Digging into this deeper -- seems like using batch statistics vs running statistics make a fair bit of difference in the convergence of the model. Do you have a good explanation for that? Seems interesting + surprising to me.
Hi --
I noticed that the last
BatchNorm
in WRN is always set tois_training
: https://github.com/google-research/uda/blob/master/image/randaugment/wrn.py#L117is_training
changes for all of the other BNs. Is this intentional? Does it give some kind of performance advantage?Thanks!