Tensor sizes not matching

matthewchung74 commented 2 years ago

Describe the bug I'm receiving an error saying

RuntimeError: The size of tensor a (128) must match the size of tensor b (64) at non-singleton dimension 1

even though I am including the --cbase 32768 flag. the first line of output says "w_dim": 512, which I assume to be the cause of the size mismatch, but am unsure of how to fix it.

To Reproduce Steps to reproduce the behavior:

run this notebook https://colab.research.google.com/drive/1FPapfKeOmp5ZE0AAY4Oc1ULZASWOTm3m#scrollTo=msjpvR1Z0UM8

paying attention to this snippet


# Fine-tune StyleGAN3-R for MetFaces-U using 1 GPU, starting from the pre-trained FFHQ-U pickle.
!python /content/drive/MyDrive/WIP/stylegan3/train.py --outdir=/content/drive/MyDrive/WIP/stylegan3/results \
--cfg=stylegan3-t \
--data=/content/drive/MyDrive/WIP/stylegan3/datasets/artimages-256x256.zip \
--gpus=1 \
--batch=16 \
--batch-gpu=8 \
--gamma=50 \
--kimg=1 \
--snap=8 \
--resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-t-ffhqu-256x256.pkl \
--cbase 32768 \
--aug=ada

which has this output

Training options: { "G_kwargs": { "class_name": "training.networks_stylegan3.Generator", "z_dim": 512, "w_dim": 512, "mapping_kwargs": { "num_layers": 2 }, "channel_base": 32768, "channel_max": 512, "magnitude_ema_beta": 0.9994456359721023 }, "D_kwargs": { "class_name": "training.networks_stylegan2.Discriminator", "block_kwargs": { "freeze_layers": 0 }, "mapping_kwargs": {}, "epilogue_kwargs": { "mbstd_group_size": 4 }, "channel_base": 32768, "channel_max": 512 }, "G_opt_kwargs": { "class_name": "torch.optim.Adam", "betas": [ 0, 0.99 ], "eps": 1e-08, "lr": 0.0025 }, "D_opt_kwargs": { "class_name": "torch.optim.Adam", "betas": [ 0, 0.99 ], "eps": 1e-08, "lr": 0.002 }, "loss_kwargs": { "class_name": "training.loss.StyleGAN2Loss", "r1_gamma": 50.0, "blur_init_sigma": 0 }, "data_loader_kwargs": { "pin_memory": true, "prefetch_factor": 2, "num_workers": 3 }, "training_set_kwargs": { "class_name": "training.dataset.ImageFolderDataset", "path": "/content/drive/MyDrive/WIP/stylegan3/datasets/artimages-256x256.zip", "use_labels": false, "max_size": 1341, "xflip": false, "resolution": 256, "random_seed": 0 }, "num_gpus": 1, "batch_size": 16, "batch_gpu": 8, "metrics": [ "fid50k_full" ], "total_kimg": 1, "kimg_per_tick": 4, "image_snapshot_ticks": 8, "network_snapshot_ticks": 8, "random_seed": 0, "ema_kimg": 5.0, "augment_kwargs": { "class_name": "training.augment.AugmentPipe", "xflip": 1, "rotate90": 1, "xint": 1, "scale": 1, "rotate": 1, "aniso": 1, "xfrac": 1, "brightness": 1, "contrast": 1, "lumaflip": 1, "hue": 1, "saturation": 1 }, "ada_target": 0.6, "resume_pkl": "https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-t-ffhqu-256x256.pkl", "ada_kimg": 100, "ema_rampup": null, "run_dir": "/content/drive/MyDrive/WIP/stylegan3/results/00025-stylegan3-t-artimages-256x256-gpus1-batch16-gamma50" }

Output directory: /content/drive/MyDrive/WIP/stylegan3/results/00025-stylegan3-t-artimages-256x256-gpus1-batch16-gamma50 Number of GPUs: 1 Batch size: 16 images Training duration: 1 kimg Dataset path: /content/drive/MyDrive/WIP/stylegan3/datasets/artimages-256x256.zip Dataset size: 1341 images Dataset resolution: 256 Dataset labels: False Dataset x-flips: False

Creating output directory... Launching processes... Loading training set... /usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:490: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. cpuset_checked))

Num images: 1341 Image shape: [3, 256, 256] Label shape: [0]

Constructing networks... Resuming from "https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-t-ffhqu-256x256.pkl" Traceback (most recent call last): File "/content/drive/MyDrive/WIP/stylegan3/train.py", line 286, in main() # pylint: disable=no-value-for-parameter File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke return callback(args, kwargs) File "/content/drive/MyDrive/WIP/stylegan3/train.py", line 281, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "/content/drive/MyDrive/WIP/stylegan3/train.py", line 96, in launch_training subprocess_fn(rank=0, c=c, temp_dir=temp_dir) File "/content/drive/MyDrive/WIP/stylegan3/train.py", line 47, in subprocess_fn training_loop.training_loop(rank=rank, c) File "/content/drive/MyDrive/WIP/stylegan3/training/training_loop.py", line 162, in training_loop misc.copy_params_and_buffers(resume_data[name], module, require_all=False) File "/content/drive/MyDrive/WIP/stylegan3/torch_utils/misc.py", line 162, in copy_params_andbuffers tensor.copy(src_tensors[name].detach()).requiresgrad(tensor.requires_grad) RuntimeError: The size of tensor a (128) must match the size of tensor b (64) at non-singleton dimension 1


Please copy&paste text instead of screenshots for better searchability.

**Expected behavior**
I would expect it to not throw an error. 

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
 - OS: Google colab
 - PyTorch version (e.g., pytorch 1.9.0) : 1.11.0+cu113
 - CUDA toolkit version (e.g., CUDA 11.4) see ^
 - NVIDIA driver version : see

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+



 - GPU : see ^

**Additional context**
Any help is very much appreciated.

alexjbusch commented 1 year ago

What was the solution to this?

TiGaI commented 1 year ago

use --cbase 32768

NVlabs / stylegan3

Tensor sizes not matching #179