KaiyangZhou / mixstyle-release

Domain Generalization with MixStyle (ICLR'21)
MIT License
268 stars 39 forks source link

The use of detach #3

Closed SY-Xuan closed 3 years ago

SY-Xuan commented 3 years ago

Thanks for your nice work.
mu, sig = mu.detach(), sig.detach() Why you use detach on these two parameters? Could you explain.

KaiyangZhou commented 3 years ago

This is related to the central idea in mixstyle: to perturb features in a layer such that the next layer can see data in a "new" style. Blocking the gradients in mu and sigma might prevent the network from erasing such augmentation effect through adjusting its weights.

However, I didn't extensively evaluate this design. I guess it might not affect the performance too much.

SY-Xuan commented 3 years ago

Got it. Thank you for your reply.