PDillis / stylegan3-fun

Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!
Other
229 stars 36 forks source link

Error when training Stylegan2-ext #38

Open nuclearsugar opened 4 months ago

nuclearsugar commented 4 months ago

When I try to start training using --cfg=stylegan2-ext then it errors out with the following message:

"TypeError: __init__() got an unexpected keyword argument 'extended_sgan2'"

PDillis commented 4 months ago

I removed certain things to make it easier, but completely forgot to thoroughly removed these extra parameters. I removed lines 275 and 277 in train.py (both say c.G_kwargs.extended_sgan2 = True), and it should run. I tested now with a small dataset and training started, so let me know if this works for you before I push the fix.

nuclearsugar commented 4 months ago

I removed lines 275 and 277 in train.py and tried to start training, but I'm seeing a different error now:

  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 367, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 360, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 94, in launch_training
    torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus)
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes
    while not context.join():
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
    fn(i, *args)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 50, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\training\training_loop.py", line 163, in training_loop
    misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
  File "C:\Users\Zenith\Desktop\stylegan3-fun\torch_utils\misc.py", line 162, in copy_params_and_buffers
    tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (512) must match the size of tensor b (1024) at non-singleton dimension 0
PDillis commented 4 months ago

Hmm this is related to #39. Thing is, I cannot reproduce it all the time, as I'm able to load pre-trained models as starting points (using --resume), but sometimes it fails in the same line in torch_utils/misc.py. Can you help me by specifying both: which pre-trained model are you starting from (resolution, RGB/RGBA, etc.), and the same for the data you are using now for training?

nuclearsugar commented 4 months ago

I'm trying to do some transfer learning. Here are the details:

Pre-Trained Model

Dataset

nuclearsugar commented 4 months ago

Interesting to note, if I instead use a snapshot of this repo that I have saved from when Stylegan2-ext was newly implemented (2023-02-22) then the training starts up without any issues.

PDillis commented 4 months ago

Ok thanks that helps, I was thinking that since it worked then and there were no issues like this. I don't remember updating much, but I'll see the diff in case I moved something.