genforce / interfacegan

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing
https://genforce.github.io/interfacegan/
MIT License
1.51k stars 281 forks source link

Omitting intercept #20

Closed jankrepl closed 5 years ago

jankrepl commented 5 years ago

Hey! First of all great job on the paper and the code!

I was wondering what was the motivation behind assuming that all boundaries (hyperplanes) pass through the origin? I would imagine this assumption might be restrictive (especially for general theory when the mapping network can produce distributions that are not centered around the origin).

Also related to this it seems like when you find the hyperplane in train_boundary via fitting of a linear model you do not enforce the intercept to be 0.

ShenYujun commented 5 years ago

Good point! You are right that the boundary may not pass through the origin. But ideally, the generative model should learn to split the attributes evenly. For example, the model should be able to synthesize a man or a woman with 50-50 chance. In this way, the corresponding boundary should go across the origin. However, for the artifact (or say, image quality) boundary, it is a little bit far away from the origin. That is because the model tends to generate good images over poor images.

Actually, in our experiments, even not forcing the intercept to be 0 as in code utils/manipulator.py, it is almost 0 after SVM training (i.e., less than 1.0 or around 1.0). If you like, you can esaily force the intercept to be 0, which will not affect the results that much.

Hope this can address your concern.

jankrepl commented 5 years ago

Thanks for the response!

I just thought the theory could be easily extended to the case with intercept and thus one does not have to rely on the assumption that mapping_network(Z) has expected value around the origin.

Also, related to this. For editing it seems like you guys basically move +- 3 unit normal vectors from the origin. Sure, it seems to be the right range for the StyleGAN but again depends in general on the mapping_network(Z) distribution. Wouldn't it make more sense to take the trained binary classifier on that given attribute and let the user specify the probability (target) the edited image has the given attribute. It is a meaningful number and always lies in (0, 1). In math terms just solve the below equation for c (multiplier applied to the normal vector)

P(original_latent_code + c * normal_vector) = target

For example for logistic regression the P(x) = 1 / (1 + exp(-<w,x> - w_0).

ShenYujun commented 5 years ago

Treating manipulation from probability aspect is a good suggestion. We will explore this direction in furture work. Thanks a lot!