chi0tzp / WarpedGANSpace

[ICCV 2021] Authors official PyTorch implementation of the "WarpedGANSpace: Finding non-linear RBF paths in GAN latent space".
Apache License 2.0
107 stars 6 forks source link

How to use learned latent direction from .npy files #1

Closed tlack closed 2 years ago

tlack commented 2 years ago

Hey there,

I've been eagerly setting up WarpedGAN on a Google Colab today and ran into a problem.

I was able to successfully run traverse_attribute_space and I see gender.npy, etc.

But these are (128,33) and ProGAN's z is (1,512).

I think I have to apply the loaded latent to the Support Set, but the exact mechanism is unclear to me.

Is there somewhere in the source that I can see how this works? How did you generate those nifty GIFs on the dev-eval branch?

chi0tzp commented 2 years ago

Hi @tlack ,

Thanks for your interest in our work. Have you trained a model yourself, or have you used a pre-trained one? The npy files produced by traverse_attribute_space.py contain the attribute paths for a given latent code. In the example you mention, 128 denotes the number of paths (i.e., the number of warping functions you've learned), while 33 denotes the number of images generated across this path.

You may want to use the pre-trained model for ProgGAN (which you can download using download_models.py -- not in the master branch yet).

The GIFS in dev-eval branch are produced by create_gif.py for a given path-id; for instance:

python create_gif.py --gif-size=196 --num-imgs=7 --dir=experiments/complete/ProgGAN-ResNet-K200-D512-LearnGammas-eps0.1_0.2/results/ProgGAN_4_F/56_0.15_8.4/435c92ab04f994fd192526b9107396747caf283a/ --path-id=96

The magic number (--path-id=96) is given by rank_interpretable_paths.py which I'm currently refactoring and will push on master very soon. As explained in the paper (Sect. 4), this script ranks the discovered paths based on the correlation of each path with the attribute vector. In this example, path 96 gives the greatest correlation for the attribute Lip Corner Puller aka _AU12 aka Smiling.

I'll merge dev-eval to master soon, and I'll also add rank_interpretable_paths.py asap, but you may start looking at the GIFS of the discovered paths already (e.g., experiments/complete/ProgGAN-ResNet-K200-D512-LearnGammas-eps0.1_0.2/results/ProgGAN_4_F/56_0.15_8.4/paths_gifs/).

tlack commented 2 years ago

You can see my awful, fledgling attempts at getting your stuff going here:

https://colab.research.google.com/drive/188bKhg_tNwjUVo4BXsiwywKnCSaT3e0x?usp=sharing

My goal for this experiment is to enter a bunch of English descriptors (skinny / fat, young / old, scared / excited), create attribute directions by adding a CLIP step into traverse_attribute.. (in the same fashion you have used those other classifiers), and then allow the end user to navigate through those latents using the learned descriptors directions, along with other manipulations.

I'm starting from ProGAN because I've had good luck with that family in other experiments.

I think I understand your guidance here: determine the best paths for each attribute using rank... and then use that path ID to retrieve workable Z's for the GAN space.

I guess create-gif kinda works from the output of traverse.. (in that it reads finished images) so I will try to understand the linkage there.

Thanks for your detailed and very rapid response. And for providing code that actually works out of the box! This may be a first in machine learning paper history. :)

chi0tzp commented 2 years ago

You can see my awful, fledgling attempts at getting your stuff going here:

https://colab.research.google.com/drive/188bKhg_tNwjUVo4BXsiwywKnCSaT3e0x?usp=sharing

My goal for this experiment is to enter a bunch of English descriptors (skinny / fat, young / old, scared / excited), create attribute directions by adding a CLIP step into traverse_attribute.. (in the same fashion you have used those other classifiers), and then allow the end user to navigate through those latents using the learned descriptors directions, along with other manipulations.

Hey @tlack, first of all, thanks for taking the time to extend our method! We've also been thinking in this direction, and may try something in the future. Before everything, please have a look at another very relevant ICCV'21 paper. It's very close to what we do (they try to optimize a vector field), but they do that in a supervised way. They also have an NLP module for editing based on verbal instructions.

I'm starting from ProGAN because I've had good luck with that family in other experiments.

I think I understand your guidance here: determine the best paths for each attribute using rank... and then use that path ID to retrieve workable Z's for the GAN space.

I'm not trying to be cryptic or anything, I just need some time to refactor the script and provide an easy-to-follow piece of code. Regardless, what we really do, as we try to describe briefly in the paper as follows:

In order to obtain a measure on how well the paths generated by a warping function are correlated with a certain attribute, we estimate the average Pearson’s correlation between the index of the step along the path and the corresponding values in the attribute vector.

Thus, we compute the Pearson's correlation between the step of the path (i.e., the index of the path: 1, 2, ..., num_of_generated_images_in_path) and the values of the respective attribute. So, support that A is an MxN numpy array where the t-th row A_t = A[t, :] contains the values for an attribute for the t-th attribute and all images across a path. Then, the correlation (which technically is not exacrly Pearson's, but an "un-normalized" version of it) is given as:

A_t_idx = np.arange(A_t.shape[0])
corr = np.cov(A_t, A_t_idx)[0, 1] / np.sqrt(np.cov(A_t_idx))

I guess create-gif kinda works from the output of traverse.. (in that it reads finished images) so I will try to understand the linkage there.

create_gif.py is really trivial. It just takes a directory with the images generated by traverse_latent_space.py across a given path, for a given latent code, etc, for instance, via --dir=experiments/complete/ProgGAN-ResNet-K200-D512-LearnGammas-eps0.1_0.2/results/ProgGAN_4_F/56_0.15_8.4/435c92ab04f994fd192526b9107396747caf283a/ --path-id=96, i.e., the directory experiments/complete/ProgGAN-ResNet-K200-D512-LearnGammas-eps0.1_0.2/results/ProgGAN_4_F/56_0.15_8.4/435c92ab04f994fd192526b9107396747caf283a/paths_images/path_096/ and just creates the GIF -- that's only been created for the README.md :)

Thanks for your detailed and very rapid response. And for providing code that actually works out of the box! This may be a first in machine learning paper history. :)

Thank you! Please consider closing the issue if the above answer your questions. I'll push the remaining script asap, stay tuned :)