This work is interesting and the results reported in the paper surprised me! I have the following two questions about this work, and would be appreciated if you can solve my confusion.
The qualitative results show that this work seems to achieve the image-to-image translation while the architecture of the proposed GAN model is represented as random noises to images. I am not really sure how to achieve the image-to-image translation. I guess it might be because of the proposed anchor space that restrict the identity preservation between source and target domains. Is it right? Could you explain it in details?
I am not really sure about the motivation on two discriminators. Why do you use D{img} for anchor space sampling while use D{patch} for entire space sampling?
Only qualitative results on ablation study are reported in the paper. But I think the quantitative results could be more convincing since it is difficult to judge the performance of different components according to Fig.5.
This work is interesting and the results reported in the paper surprised me! I have the following two questions about this work, and would be appreciated if you can solve my confusion.
Best regards!