In your paper, you mention that the discriminator solves a series of "binary classification tasks determining whether an input image is a real image of the source class or a translation output coming from G. As there are |S| source classes, D produces |S| outputs" But you "do not penalize D for not predicting false for images of other classes (S{c_x})," which means D can predict any number of classes as long as class c_x is predicted false/positive (depending on whether that is a real or fake sample).
Then there is the feature matching loss, that forces that the identity feature from G(x, y) be as similar to Df(y) as possible. All cool until here.
Now, if D can output any number of classes as positives, how do you make sure that the last layer of D is not just broadcasting the same value to all classes, meaning it just determines whether the sample is real or false, independent of the class? That would make the identity features of class c_y almost the same as those of class c_x, with minimal feature matching loss no matter whether the output is from class cx or class cy.
In your paper, you mention that the discriminator solves a series of "binary classification tasks determining whether an input image is a real image of the source class or a translation output coming from G. As there are |S| source classes, D produces |S| outputs" But you "do not penalize D for not predicting false for images of other classes (S{c_x})," which means D can predict any number of classes as long as class c_x is predicted false/positive (depending on whether that is a real or fake sample).
Then there is the feature matching loss, that forces that the identity feature from G(x, y) be as similar to Df(y) as possible. All cool until here.
Now, if D can output any number of classes as positives, how do you make sure that the last layer of D is not just broadcasting the same value to all classes, meaning it just determines whether the sample is real or false, independent of the class? That would make the identity features of class c_y almost the same as those of class c_x, with minimal feature matching loss no matter whether the output is from class cx or class cy.
Any clarification would be appreciated :)