Why are the coarse details determined from the larger blocks?

Richienb commented 2 years ago

In a ResNet, the coarser blocks come first:

https://github.com/eladrich/pixel2style2pixel/blob/361117156fc4eb90f463a1ca71eaf7f80d573e67/models/encoders/helpers.py#L32-L35

So why do the coarse style blocks use the fine resnet blocks?

https://github.com/eladrich/pixel2style2pixel/blob/361117156fc4eb90f463a1ca71eaf7f80d573e67/models/encoders/psp_encoders.py#L95-L105

In the video that was provided, each sample has randomness introduced through replacing the fine stylegan input latents with the random noise. This means the difference between all of the images is the fine layer. It is observed that skin tone is from the fine style layer and the facial features are from the coarse style layer. Is that meant to happen?

https://user-images.githubusercontent.com/29491356/203987089-62e51315-85b4-44f3-8ea6-77e293e9ea2c.mp4

Richienb commented 2 years ago

I believe this is because larger blocks have more space to store the same information that is stored in the smaller blocks. That must mean larger blocks end up storing coarser details and smaller blocks end up storing finer details.

Richienb commented 2 years ago

Perhaps this is also because the smaller blocks have had less convolutions applied and thus contain much more detail.

yuval-alaluf commented 2 years ago

What @Richienb is correct. Basically, the coarse ResNet layers (i.e., the early layers) have gone through less processing and store finer details such as colors and texture. Hence these layers are related to the fine StyleGAN layers. In contrast, the fine ResNet layers (i.e., those at the end of the layer) have gone through a lot of processing and store more semantic information. As such, they are related to the coarse and medium SG layers. Hope this helps.

eladrich / pixel2style2pixel

Why are the coarse details determined from the larger blocks? #299