Closed Richienb closed 1 year ago
I believe this is because larger blocks have more space to store the same information that is stored in the smaller blocks. That must mean larger blocks end up storing coarser details and smaller blocks end up storing finer details.
Perhaps this is also because the smaller blocks have had less convolutions applied and thus contain much more detail.
What @Richienb is correct. Basically, the coarse ResNet layers (i.e., the early layers) have gone through less processing and store finer details such as colors and texture. Hence these layers are related to the fine StyleGAN layers. In contrast, the fine ResNet layers (i.e., those at the end of the layer) have gone through a lot of processing and store more semantic information. As such, they are related to the coarse and medium SG layers. Hope this helps.
In a ResNet, the coarser blocks come first:
https://github.com/eladrich/pixel2style2pixel/blob/361117156fc4eb90f463a1ca71eaf7f80d573e67/models/encoders/helpers.py#L32-L35
So why do the coarse style blocks use the fine resnet blocks?
https://github.com/eladrich/pixel2style2pixel/blob/361117156fc4eb90f463a1ca71eaf7f80d573e67/models/encoders/psp_encoders.py#L95-L105
In the video that was provided, each sample has randomness introduced through replacing the fine stylegan input latents with the random noise. This means the difference between all of the images is the fine layer. It is observed that skin tone is from the fine style layer and the facial features are from the coarse style layer. Is that meant to happen?
https://user-images.githubusercontent.com/29491356/203987089-62e51315-85b4-44f3-8ea6-77e293e9ea2c.mp4