RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 1

creatorcao commented 7 months ago

python train.py --outdir=./test --data=./images256x256.zip --cfg=stylegan3-r --gpus=1 --batch=32 --gamma=0.5 \
--freezed=13 --workers=2 --mirror=1 --kimg=2000 --tick=1 --snap=10 --metrics=none --cbase=16384 --cond=1 \
--resume=./weights/stylegan3-r-ffhqu-256x256.pkl

I received the error below when trying to train images with labels with pretrained weights, could somebody help me to fix this?

File "stylegan3/torch_utils/misc.py", line 162, in copy_params_andbuffers tensor.copy(src_tensors[name].detach()).requiresgrad(tensor.requires_grad) RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 1

sans-dev commented 5 months ago

I get the same error but with different shapes:

Number of GPUs:      1
Batch size:          32 images
Training duration:   5000 kimg
Dataset path:        dataset/inat-insects.zip
Dataset size:        84524 images
Dataset resolution:  512
Dataset labels:      False
Dataset x-flips:     True

Creating output directory...
Launching processes...
Loading training set...

Num images:  169048
Image shape: [3, 512, 512]
Label shape: [0]

Constructing networks...
Resuming from "models/stylegan3-r-afhqv2-512x512.pkl"
Traceback (most recent call last):
  File "train.py", line 286, in <module>
    main() # pylint: disable=no-value-for-parameter
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "train.py", line 281, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "train.py", line 96, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "train.py", line 47, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/scratch/training/training_loop.py", line 162, in training_loop
    misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
  File "/scratch/torch_utils/misc.py", line 162, in copy_params_and_buffers
    tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (512) must match the size of tensor b (1024) at non-singleton dimension 1

This is my cmd: docker run --gpus all -it --shm-size 50G --rm --user $(id -u):$(id -g) -vpwd:/scratch --workdir /scratch -e HOME=/scratch stylegan3 python train.py --outdir=~/training-runs --cfg=stylegan3-t --data=dataset/inat-insects.zip --gpus=1 --batch=32 --gamma=8.2 --mirror=1 --kimg=5000 --snap=5 --resume=models/stylegan3-r-afhqv2-512x512.pkl

qq272574497 commented 2 months ago

creato

我也有同样问题，能互相交流吗？

NVlabs / stylegan3

RuntimeError: The size of tensor a (1024) must match the size of tensor b (512) at non-singleton dimension 1 #629