PDillis / stylegan3-fun

Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!
Other
230 stars 36 forks source link

I loaded the pre-training weights during training and the resolution matches my training set, but an error is reported in train.py. If it works fine without pre-training weights, which file do I need to change? #39

Open 999789 opened 5 months ago

999789 commented 5 months ago

Traceback (most recent call last): File "train.py", line 369, in main() # pylint: disable=no-value-for-parameter File "/root/miniconda3/lib/python3.8/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/root/miniconda3/lib/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/root/miniconda3/lib/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/root/miniconda3/lib/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) File "train.py", line 362, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "train.py", line 94, in launch_training torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus) File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/root/autodl-tmp/stylegan3-fun-main/train.py", line 50, in subprocess_fn training_loop.training_loop(rank=rank, **c) File "/root/autodl-tmp/stylegan3-fun-main/training/training_loop.py", line 163, in training_loop misc.copy_params_and_buffers(resume_data[name], module, require_all=False) File "/root/autodl-tmp/stylegan3-fun-main/torch_utils/misc.py", line 162, in copy_params_andbuffers tensor.copy(src_tensors[name].detach()).requiresgrad(tensor.requires_grad) RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1

999789 commented 5 months ago

python train.py --outdir=training-runs --cfg=stylegan3-t --data=/root/autodl-tmp/stylegan3-fun-main/hechengtupianrgba.zip --gpus=4 --batch=16 --gamma=6 --mirror=1 --kimg=5000 --snap=25 --batch-gpu=4 --metrics=none --resume=/root/autodl-tmp/stylegan3-fun-main/network-snapshot-011000.pkl

PDillis commented 5 months ago

Basically, the mismatch says it's when trying to load the pre-trained .pkl on the newly constructed stylegan3-t configuration. I'll try to fix it, as it also failed with me with a pre-trained StyleGAN3-T model, so perhaps the construction of the new networks is wrong. I'll update this whenever I can fix it.

999789 commented 5 months ago

Thanks for the reply.