Open nuclearsugar opened 4 months ago
I removed certain things to make it easier, but completely forgot to thoroughly removed these extra parameters. I removed lines 275 and 277 in train.py
(both say c.G_kwargs.extended_sgan2 = True
), and it should run. I tested now with a small dataset and training started, so let me know if this works for you before I push the fix.
I removed lines 275 and 277 in train.py
and tried to start training, but I'm seeing a different error now:
File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 367, in <module>
main() # pylint: disable=no-value-for-parameter
File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 360, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 94, in launch_training
torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus)
File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes
while not context.join():
File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "C:\Users\Zenith\miniconda3\envs\stylegan3\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, *args)
File "C:\Users\Zenith\Desktop\stylegan3-fun\train.py", line 50, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "C:\Users\Zenith\Desktop\stylegan3-fun\training\training_loop.py", line 163, in training_loop
misc.copy_params_and_buffers(resume_data[name], module, require_all=False)
File "C:\Users\Zenith\Desktop\stylegan3-fun\torch_utils\misc.py", line 162, in copy_params_and_buffers
tensor.copy_(src_tensors[name].detach()).requires_grad_(tensor.requires_grad)
RuntimeError: The size of tensor a (512) must match the size of tensor b (1024) at non-singleton dimension 0
Hmm this is related to #39. Thing is, I cannot reproduce it all the time, as I'm able to load pre-trained models as starting points (using --resume
), but sometimes it fails in the same line in torch_utils/misc.py
. Can you help me by specifying both: which pre-trained model are you starting from (resolution, RGB/RGBA, etc.), and the same for the data you are using now for training?
I'm trying to do some transfer learning. Here are the details:
Pre-Trained Model
Dataset
Interesting to note, if I instead use a snapshot of this repo that I have saved from when Stylegan2-ext was newly implemented (2023-02-22) then the training starts up without any issues.
Ok thanks that helps, I was thinking that since it worked then and there were no issues like this. I don't remember updating much, but I'll see the diff in case I moved something.
When I try to start training using
--cfg=stylegan2-ext
then it errors out with the following message:"TypeError: __init__() got an unexpected keyword argument 'extended_sgan2'"