genforce / ghfeat

[CVPR 2021] Generative Hierarchical Features from Synthesizing Images
https://genforce.github.io/ghfeat/
157 stars 20 forks source link

Does the whole project include the code of local editing? #5

Closed Radium98 closed 2 years ago

Radium98 commented 3 years ago

Hi, friend, your work is very exciting and wonderful. But I want to know that if the code for different task especially local editing is included in the code you reaveal? Or your code is just used to generate the hierarchical feature needed in multi-tasks you have mentioned? I am a pytorch user, your code is about tensorflow. If it is the latter condition I mentioned above, I think I can use the feature generated by your code more concisely, beccause I just need to take the feature without modifying your code. By the way,what does the “image list” mean? I suppose it is a folder containing some images, but the error that program pulled out indicates that it's not the right understanding. I will be appreciated if I can receive your reply as soon as possible. Thank you!

ShenYujun commented 3 years ago

This repo only supports extracting hierarchical feature currently. The image list means a txt file that consists of a collection of image paths.

Radium98 commented 3 years ago

ok,I got it ,thank you very much

Radium98 commented 3 years ago

I want to know how to generate the images of local editing?Whether it means I need to swap a certain region of GH-Feat from a certain level mannually,and feed the modified GH-Feat into the generator, then the area of the image generator produces will change accordingly? Or swapping the region can be done by certain part of program?

ShenYujun commented 3 years ago

First, GH-Feat can be obtained by (1) extracting from real images with our encoder or (2) sampling from Z space and forward to Y space. As a result, you can use GH-Feat from real images for swapping, but in our paper, we just use some random GH-Feat. Second, to perform local editing, we swap GH-Feat at some particular layer with respect to a region of the feature map (instead of the entire feature map). If swapping for the entire feature map, it will cause global editing.

Radium98 commented 3 years ago

GH-feat is a vector ,how to embedd it in a feature map?Use it to replace a certain row of feature map at a particular level?Or just change some part of the style code(actually it's the GH-Feat)? Because the paper mention that "the synthesized image can be completely determined by these style codes without any other variations"

ShenYujun commented 3 years ago

You are right that GH-Feat is a vector at each layer. According to the formulation of AdaIN, assuming a feature map with shape H x W x C and a GH-Feat f with shape 1 x C, AdaIN first broadcasts GH-Feat to shape H x W x C and then conduct element-wise multiplication. As a result, you can swap f_1 and f_2 at some certain spatial region, meaning that h x w x C uses f_2 and the other region keeps using f_1.

Radium98 commented 3 years ago

Do f_1 and f_2 mean different GH-feat reproduced by different leval? Or just the f_1 is the GH-feat got in image recovery(after broadcasting), then I want to do local editing, so I change f_1 to f_2? How does it come from f_1?

ShenYujun commented 3 years ago

Assuming there are L layers in total. f_1 and f_2 are with shape 1 x C at layer l (l = 1, ..., L). Note that C may be different for different layers. f_1 comes from the image you would like to edit. f_2 can come from another image, OR come from a sampled latent code z. Then, try to use f_2 to replace f_1 at some particular layer (level) l and some certain region h x w. Other layers and other regions at layer l still use f_1.

Radium98 commented 3 years ago

Thank you for your amazing patience! Good luck in your study!

stylebased commented 3 years ago

So how do you find the certain spatial region to do GH-Feat swapping in local edit? And I think in stylegan different layer has different number of channels but in your results, all the layers seem to have same number of dimension of y(14*1024), for example 1024 for all. Can you explain this?

ShenYujun commented 3 years ago
  1. To get the spatial region for swapping, you can first select the region of interest on the image with the largest resolution, and then downsample the mask to the resolution of a particular layer.
  2. 1024 is the maximum number of channels among all layers. We hence pad all layers to 1024 just for simple implementation, in which way we can use one tensor with shape (14 * 1024) instead of using 14 tensors with different shapes to pass arguments across functions.