betterze / StyleSpace

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation
311 stars 34 forks source link

About using N samples to find attribute specific channels #25

Open sarmientoj24 opened 2 years ago

sarmientoj24 commented 2 years ago

I have some few questions:

  1. What is the difference of Localized Channels vs Attribute-Specific Channels? Is Localized Channel required to get the attribute-specific channel?
  2. My understanding from the paper is that you can approximate the channels (e.g. 4_56) using N samples that has the desired attribute. Say, car with grass. Can we approximate that channel without using pretrained classifiers and just some sample images?
sarmientoj24 commented 2 years ago

To make it clearer, say I have a new model trained and I want to approximate the channel for a certain attribute (e.g. 4_45), are the classifiers necessary?

betterze commented 2 years ago

Please take a look of section 5. We just need 20-30 example images to find target channel. We dont need a classifer.

'Localized Channels' represent channels that do localized control. 'Attribute-Specific Channels' mean channels related to a given attribute. For example, 'smiling' attribute is localized, the channels controling 'smiling' is localized channels. Given 20-30 smiling images, we cna find the smiling channels using the 'Attribute-Specific Channels' method.

sarmientoj24 commented 2 years ago

Thank you for this. Unfortunately, I still do not understand the difference between localized control and attribute control. My understanding is that the attribute-specific channel is the channel that controls the smiling attribute. But you mentioned that it is localized instead.

Moreover, does this:

Please take a look of section 5. We just need 20-30 example images to find target channel. We dont need a classifer.

point to Attribute-Specific Channels?

From here, it talked about

To get the attribute specific channels (for example, channels for smiling), we need to use classifers to annotate a set of generated images.

which has the need of classifiers.

sarmientoj24 commented 2 years ago

Is the layer + index like 4_256 localized control or attribute control?

sarmientoj24 commented 2 years ago

I also figured that it has not good transferability. The channels I had here are for orig, sky, headlights, color, grass
Do you think the transferability grows when there are more example images?

image image image

woctezuma commented 2 years ago

I don't think there is an opposition between attribute-specific channels and localized channels.

Basically, if you have classifiers for the attribute of interest, then you can generate "a large number (1K) of positive examples" in order to try to identify attribute-specific channels. However, if the attribute of interest is localized, then it is likely sufficient to use a few positive examples in order to identify roughly the same attribute-specific channels.

Pic

Feel free to correct me though as I have superficial understanding of the subject here. :)

sarmientoj24 commented 2 years ago

@betterze I just have some questions on how to adapt the steps #3 and onwards using 1K images for a custom model, say car.

Generate Semantic segmentation for natural human face. You could replace this part by custom model. Since the gradient map is only 32x32 resolution, it could not distingush small regions, please combine small semantic regions to a big one (up lip+ down lip + mouth =mouth).
sarmientoj24 commented 2 years ago

Any suggestions where to get the segmentation pretrained models for bedroom, etc.?

sarmientoj24 commented 2 years ago

Why does this need a classifier?

_"To get the attribute specific channels (for example, channels for smiling), we need to use classifers to annotate a set of generated images. Please refer to this part to download the classifers (1), and annotate the images (3). The only difference is that when generating images, we use trication trick (remove the --notruncation flag) as following:"

What instructions do I need to get the specific channel attributes using 20-30 images?

woctezuma commented 2 years ago

You don't need a classifier.

Classifiers are used to "annotate a set of generated images", which allows to generate "a large number (1K) of positive examples".

If you already have enough images, then just skip that part and go to the next step.