NVlabs / stylegan2-ada

StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation
https://arxiv.org/abs/2006.06676
Other
1.8k stars 500 forks source link

ffhq1024 fakes_init look bad #3

Closed aiXander closed 4 years ago

aiXander commented 4 years ago

Upon running a straight, out-of-the-box train.py with --resume=ffhq1024, the "fakes_init.png" looks very weird: https://storage.googleapis.com/public-assets-xander/fakes_init.jpg

aiXander commented 4 years ago

The problem disappears after finetuning for a very short amount of time though... It's like the to_rgb layer is not properly initialized or something

aiXander commented 4 years ago

Ok, I think the reason is that the new network architecture, by default, only uses 2 mapping layers instead of 8, so not all the parameters are restored, hence the w+ space is essentially un-tuned when simply loading a pretrained ffhq.

It might make more sense to still use 8 mapping layers when restoring a SG2 checkpoint.

woctezuma commented 3 years ago

Thanks to this issue, I have noticed as well. That is why I added to my fork (https://github.com/NVlabs/stylegan2-ada/pull/6) a command-line argument to override the mapping net depth.

the reason is that the new network architecture, by default, only uses 2 mapping layers instead of 8

Yes and no.

Yes, because the default config (auto) has map=2, as you noticed. If you use transfer learning from the source nets, it works but you might prefer map=8.

https://github.com/NVlabs/stylegan2-ada/blob/a831de288449690d5d81c768857fbd5b5052e8d3/train.py#L162-L170

No, because the baseline in the article is actually trained with map=8, as the paper256 config (and every config except cifar). So it is not so much about a "new network architecture". It is the auto config which can be misleading.

article Fig 24

betterze commented 3 years ago

@woctezuma @tr1pzz Thank you for sharing this, I also find this problem.

Do you have any insight about why they use mapping layers as 2 rather than 8? 2 layers give better performance?

Thank you in advance.

woctezuma commented 3 years ago

Do you have any insight about why they use mapping layers as 2 rather than 8? 2 layers give better performance?

I don't know why they made the call to use a depth of 2 for the auto config. I guess that is because the authors noticed that it works better on CIFAR (as you can see on the table above, CIFAR+tuning uses a depth of 2 instead of 8), and thought that the datasets of other people might look more like CIFAR (more diverse and lower resolution).

Maybe better performance, as in a lower amount of computing resources (namely memory) should be needed. However, not better performance in terms of quality of the results on high-resolution datasets such as FFHQ, AFHQ, etc.

betterze commented 3 years ago

@woctezuma Thank you for your detailed reply, I really appreciate it.