genforce / interfacegan

[CVPR 2020] Interpreting the Latent Space of GANs for Semantic Face Editing
https://genforce.github.io/interfacegan/
MIT License
1.51k stars 281 forks source link

Inconsistency in linear interpolation for editing #84

Open juancprzs opened 3 years ago

juancprzs commented 3 years ago

Hi there,

Thanks for the great work!

I'm visualizing some samples generated by your approach through the edit.py script, and I have a question. My understanding is that this script generates samples by:

  1. Starting from a latent code, call it z, a direction in that space, refer to it by a unit vector n, starting and finishing magnitudes, call them s and f, respectively, and
  2. Generating a sequence of latent codes by making a linear interpolation from vector v1 = z - s n to vector v2 = z + f n.

I understand this is done in the function linear_interpolate here, that is used here, and whose product is stored in the variable interpolations. From this understanding of the code, I would expect that all the codes saved in the interpolations variable are at an L2 distance of, at most, max(s, f); and, in particular, there should be an interpolation for which the distance is s, and another interpolation for which the distance is f. However, when I check this in the code by running np.linalg.norm(interpolations - latent_codes[sample_id:sample_id + 1], axis=1) , I get other results.

I think this is either because I'm misunderstanding something, or because there is a (small) bug in the code. Such bug would probably be inconsequential, but I thought I should report it. I think the bug itself has to do with this line in particular, in the linear_interpolate function. Specifically, I'm unable to understand why the computation latent_code.dot(boundary.T) is performed. The boundary variable is a direction, rather than an actual boundary, right? As there is no presence of the bias term to determine the actual side of the hyperplane on which latent_code is falling. Further, I see that no such analogous computation is performed for the case of latent codes in the W+ space (see here).

I think that particular line is the root of the problem I observe. This is because, if I simply comment that line and run the code, the results are as expected.

Could you please take a look into my claims?

Thank you!