Questions about `--head_layers`

The Pokemon example in the README trains a stem at 16x16 then trains a 32x32 super-resolution stage:

python train.py \
  --cfg=stylegan3-t \
  --outdir=./training-runs/pokemon \
  --data=./data/pokemon16.zip \
  --gpus=8 --batch=64 --batch-gpu 8 \
  --mirror=1 \
  --snap 10 \
  --kimg 10000 \
  --syn_layers 10

python train.py \
  --cfg=stylegan3-t \
  --outdir=./training-runs/pokemon \
  --data=./data/pokemon32.zip  --mirror=1 \
  --gpus=8 --batch=64 --batch-gpu 8 \
  --snap 10 --kimg 10000 \
  --superres --up_factor 2 --head_layers 7 \
  --path_stem training-runs/pokemon/00000-stylegan3-t-pokemon16-gpus8-batch64/best_model.pkl

--up_factor 2 makes sense as we double the resolution. --head_layers 7 comes from the paper (3.2 Reintroducing Progressive Growing):

We start progressive growing at a resolution of 16^2 using 11 layers. Every time the resolution increases, we cut off 2 layers and add 7 new ones.

What's not clear to me:

1. Unexpected number of layers in generator

When training the stem, the generator has 11 layers as expected: synthesis.L0_36_1024, synthesis.L1_36_1024, ..., synthesis.L10_16_3. When training the super-resolution stage using --up_factor 2 --head_layers 7, I would expect 11 -2 +7 = 16 layers but I see layers from synthesis.L0_36_1024 to synthesis.L16_32_3 so 17 layers which is one too many. What am I missing?

2. How to tune `head_layers` when training several stages at once

To get to higher resolution faster, the README suggests the following:

Suppose you want to train as few stages as possible. We recommend training a 32x32 or 64x64 stem, then directly scaling to the final resolution (as described above, you must adjust --up_factor accordingly).

Am I correct that in that case, I also need to adjust --head_layers?

For example to train from a 16 stem to 256, I need --up_factor 16. I would normally do 4 training stages between resolution 16 and 256 (32, 64, 128, 256) so I need to add 4x5 = 20 layers, and compensate for the 2 that will be removed, so I use --head_layers 22.

Does this make sense? I initially assumed the number of layers to add would be inferred from --up_factor but that doesn't seem to be the case.

autonomousvision / stylegan-xl