NVlabs / stylegan3

Official PyTorch implementation of StyleGAN3
Other
6.3k stars 1.11k forks source link

Bug in conditioning of discriminator? #209

Open DEBIHOOD opened 1 year ago

DEBIHOOD commented 1 year ago

Describe the bug While the conditioning of the generator seems to construct the net and behave expectably, the conditioning of the discriminator seems to ignore depth of the mapping network.

To Reproduce Steps to reproduce the behavior: Create any model with --map-depth value other than 8:

Generator            Parameters  Buffers  Output shape      Datatype
---                  ---         ---      ---               ---
mapping.embed        3072        -        [16, 512]         float32
mapping.fc0          524800      -        [16, 512]         float32
mapping.fc1          262656      -        [16, 512]         float32
mapping              -           512      [16, 10, 512]     float32
synthesis.b4.conv1   69761       32       [16, 64, 4, 4]    float32
synthesis.b4.torgb   33027       -        [16, 3, 4, 4]     float32
synthesis.b4:0       1024        16       [16, 64, 4, 4]    float32
synthesis.b4:1       -           -        [16, 3, 4, 4]     float32
synthesis.b8.conv0   69761       80       [16, 64, 8, 8]    float32
synthesis.b8.conv1   69761       80       [16, 64, 8, 8]    float32
synthesis.b8.torgb   33027       -        [16, 3, 8, 8]     float32
synthesis.b8:0       -           16       [16, 64, 8, 8]    float32
synthesis.b8:1       -           -        [16, 3, 8, 8]     float32
synthesis.b16.conv0  69761       272      [16, 64, 16, 16]  float32
synthesis.b16.conv1  69761       272      [16, 64, 16, 16]  float32
synthesis.b16.torgb  33027       -        [16, 3, 16, 16]   float32
synthesis.b16:0      -           16       [16, 64, 16, 16]  float32
synthesis.b16:1      -           -        [16, 3, 16, 16]   float32
synthesis.b32.conv0  69761       1040     [16, 64, 32, 32]  float32
synthesis.b32.conv1  69761       1040     [16, 64, 32, 32]  float32
synthesis.b32.torgb  33027       -        [16, 3, 32, 32]   float32
synthesis.b32:0      -           16       [16, 64, 32, 32]  float32
synthesis.b32:1      -           -        [16, 3, 32, 32]   float32
synthesis.b64.conv0  69761       4112     [16, 64, 64, 64]  float32
synthesis.b64.conv1  69761       4112     [16, 64, 64, 64]  float32
synthesis.b64.torgb  33027       -        [16, 3, 64, 64]   float32
synthesis.b64:0      -           16       [16, 64, 64, 64]  float32
synthesis.b64:1      -           -        [16, 3, 64, 64]   float32
---                  ---         ---      ---               ---
Total                1584536     11632    -                 -

Discriminator  Parameters  Buffers  Output shape      Datatype
---            ---         ---      ---               ---
b64.fromrgb    256         16       [16, 64, 64, 64]  float32
b64.skip       4096        16       [16, 64, 32, 32]  float32
b64.conv0      36928       16       [16, 64, 64, 64]  float32
b64.conv1      36928       16       [16, 64, 32, 32]  float32
b64            -           16       [16, 64, 32, 32]  float32
b32.skip       4096        16       [16, 64, 16, 16]  float32
b32.conv0      36928       16       [16, 64, 32, 32]  float32
b32.conv1      36928       16       [16, 64, 16, 16]  float32
b32            -           16       [16, 64, 16, 16]  float32
b16.skip       4096        16       [16, 64, 8, 8]    float32
b16.conv0      36928       16       [16, 64, 16, 16]  float32
b16.conv1      36928       16       [16, 64, 8, 8]    float32
b16            -           16       [16, 64, 8, 8]    float32
b8.skip        4096        16       [16, 64, 4, 4]    float32
b8.conv0       36928       16       [16, 64, 8, 8]    float32
b8.conv1       36928       16       [16, 64, 4, 4]    float32
b8             -           16       [16, 64, 4, 4]    float32
mapping.embed  384         -        [16, 64]          float32
mapping.fc0    4160        -        [16, 64]          float32
mapping.fc1    4160        -        [16, 64]          float32
mapping.fc2    4160        -        [16, 64]          float32
mapping.fc3    4160        -        [16, 64]          float32
mapping.fc4    4160        -        [16, 64]          float32
mapping.fc5    4160        -        [16, 64]          float32
mapping.fc6    4160        -        [16, 64]          float32
mapping.fc7    4160        -        [16, 64]          float32
b4.mbstd       -           -        [16, 65, 4, 4]    float32
b4.conv        37504       16       [16, 64, 4, 4]    float32
b4.fc          65600       -        [16, 64]          float32
b4.out         4160        -        [16, 64]          float32
b4             -           -        [16, 1]           float32
---            ---         ---      ---               ---
Total          452992      288      -                 -

Expected behavior I expect that the discrimintor would have same number of mapping layers as generator, in example above, 2 was used for --map-depth, but the discriminator have 8 mapping layers at the end of itself for some reason. While it seems that it doesn't break the model(not sure), because it trains, it progresses, i didn't did a long run, just a couple of KIMGs, so can't say anything indepth about it.

Alias Free GAN(StyleGAN 3) impy on that 8 layers of mapping network are unnecesary, and 2 layers is enough, so i decided to use same strategy with SG2 network. By the way, with --cfg=stylegan3-t/r config, same thing happens, even though 2 layers was configured for it out of the box in the code.

At first i though that there is some hardcoded constant value of 8 layers in the code that was left by mistake, but i didn't find anything that could prove this point, moreover, i find this part of code very convoluted and hard to understand(at least for me), which gave more questions than answers for me. I am talking about networks_stylegan2/3.py file, cause i was investigating into it, and i'm pretty sure this is the right place to look at.

Additional context This is just a toy model i was using to experiment with it, so don't try to understand why it has just 64 filters in all channels, and other stuff ;) CMD prompt that was used to initialize the model: python train.py --cfg=stylegan2 --gpus=1 --batch=16 --outdir= --data= --cmax=64 --metrics=none --gamma=2 --mirror=1 --fp32=1 --cond=1 --map-depth=2 GTX10X0 series was used, so FP32 mode was turned on, cause mixed precision didn't give any speedup, only slowdown on GPUs of this series.