NVlabs / DG-Net

:couple: Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral) :couple:
https://www.zdzheng.xyz/publication/Joint-di2019
Other
1.27k stars 230 forks source link

about gradients for appearance encoder (id_a). #14

Closed wencaizhong closed 5 years ago

wencaizhong commented 5 years ago

Hi, thanks for your great work. I noticed that x_ba is feeded into self.id_a two times. One for teacher loss (only influence id_a) and another for code reconstruction loss and recon id loss (not influence id_a).

Have you tried only forward once and will that hurt the model? I confused about why it's designed in this way.

layumi commented 5 years ago

Hi @wencaizhong

Sorry for the late response. As claimed in the Bicyclegan [1], encoding the generated images is sometime tricky. Sometimes it will tell the generated images x_ba to preserve more details of inputs x_a, so that it could be easy to reconstruct the inputs. (x_ab to preserve the details of x_b, so the cycle loss is small). It is a disaster to conditional image generation.

To avoid this problem, one straight forward is to detached the gradient of generated images (to update the encoder) or fixed the weight of the encoder (calculated the gradient on the generated image). As you noticed, we adapt a two-step policy.

When training the self.id_a, we do not want that the gradient propagates to the generated inputs. So we detached the gradients to the generated inputs.

When training the code reconstruction loss and recon id loss, we fix the encoder weights and calculate the gradients to the generated images. (Note that this method is not necessary. Sometime we also can train them together. It is tricky to find a balanced point.)

[1] Zhu, Jun-Yan, et al. "Toward multimodal image-to-image translation." Advances in Neural Information Processing Systems. 2017. https://arxiv.org/abs/1711.11586

wencaizhong commented 5 years ago

Thanks for your reply.