Closed SY-Xuan closed 3 years ago
This is related to the central idea in mixstyle: to perturb features in a layer such that the next layer can see data in a "new" style. Blocking the gradients in mu and sigma might prevent the network from erasing such augmentation effect through adjusting its weights.
However, I didn't extensively evaluate this design. I guess it might not affect the performance too much.
Got it. Thank you for your reply.
Thanks for your nice work.
mu, sig = mu.detach(), sig.detach()
Why you use detach on these two parameters? Could you explain.