autonomousvision / stylegan-xl

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
MIT License
961 stars 113 forks source link

Questions about `--head_layers` #84

Open MasterScrat opened 2 years ago

MasterScrat commented 2 years ago

The Pokemon example in the README trains a stem at 16x16 then trains a 32x32 super-resolution stage:

python train.py \
  --cfg=stylegan3-t \
  --outdir=./training-runs/pokemon \
  --data=./data/pokemon16.zip \
  --gpus=8 --batch=64 --batch-gpu 8 \
  --mirror=1 \
  --snap 10 \
  --kimg 10000 \
  --syn_layers 10

python train.py \
  --cfg=stylegan3-t \
  --outdir=./training-runs/pokemon \
  --data=./data/pokemon32.zip  --mirror=1 \
  --gpus=8 --batch=64 --batch-gpu 8 \
  --snap 10 --kimg 10000 \
  --superres --up_factor 2 --head_layers 7 \
  --path_stem training-runs/pokemon/00000-stylegan3-t-pokemon16-gpus8-batch64/best_model.pkl

--up_factor 2 makes sense as we double the resolution. --head_layers 7 comes from the paper (3.2 Reintroducing Progressive Growing):

We start progressive growing at a resolution of 16^2 using 11 layers. Every time the resolution increases, we cut off 2 layers and add 7 new ones.

What's not clear to me:

1. Unexpected number of layers in generator

When training the stem, the generator has 11 layers as expected: synthesis.L0_36_1024, synthesis.L1_36_1024, ..., synthesis.L10_16_3. When training the super-resolution stage using --up_factor 2 --head_layers 7, I would expect 11 -2 +7 = 16 layers but I see layers from synthesis.L0_36_1024 to synthesis.L16_32_3 so 17 layers which is one too many. What am I missing?

2. How to tune head_layers when training several stages at once

To get to higher resolution faster, the README suggests the following:

Suppose you want to train as few stages as possible. We recommend training a 32x32 or 64x64 stem, then directly scaling to the final resolution (as described above, you must adjust --up_factor accordingly).

Am I correct that in that case, I also need to adjust --head_layers?

For example to train from a 16 stem to 256, I need --up_factor 16. I would normally do 4 training stages between resolution 16 and 256 (32, 64, 128, 256) so I need to add 4x5 = 20 layers, and compensate for the 2 that will be removed, so I use --head_layers 22.

Does this make sense? I initially assumed the number of layers to add would be inferred from --up_factor but that doesn't seem to be the case.

woctezuma commented 2 years ago

Regarding point n°1, maybe the issue comes from:

https://github.com/autonomousvision/stylegan_xl/blob/b5b96835702761ef849903f5c410cec077867718/training/networks_stylegan3_resetting.py#L630-L635

notably the +1 in: https://github.com/autonomousvision/stylegan_xl/blob/b5b96835702761ef849903f5c410cec077867718/training/networks_stylegan3_resetting.py#L635

which is then followed by:

https://github.com/autonomousvision/stylegan_xl/blob/b5b96835702761ef849903f5c410cec077867718/training/networks_stylegan3_resetting.py#L642-L643

--

There is also this for-loop which is of interest:

https://github.com/autonomousvision/stylegan_xl/blob/b5b96835702761ef849903f5c410cec077867718/training/networks_stylegan3_resetting.py#L654-L657

There will be 3 layers with is_critically_sampled set to True. So the +1 mentioned above may adjust for that. Then the total would actually be 11 -3 +1 +7 = 16, which is equivalent, and does not solve your issue.

--

That being said, the +1 also appears in: https://github.com/autonomousvision/stylegan_xl/blob/b5b96835702761ef849903f5c410cec077867718/training/networks_stylegan3_resetting.py#L645-L647

which may be more suspicious.

Other elements of interest:

https://github.com/autonomousvision/stylegan_xl/blob/b5b96835702761ef849903f5c410cec077867718/training/networks_stylegan3_resetting.py#L462-L463

https://github.com/autonomousvision/stylegan_xl/blob/b5b96835702761ef849903f5c410cec077867718/training/networks_stylegan3_resetting.py#L469-L470

https://github.com/autonomousvision/stylegan_xl/blob/b5b96835702761ef849903f5c410cec077867718/training/networks_stylegan3_resetting.py#L481