Setting up PyTorch plugin "upfirdn2d_plugin"... Traceback (most recent call last):
File "/ibex/ai/home/chedelp/stylegan3/stylegan3/train.py", line 286, in <module>
main() # pylint: disable=no-value-for-parameter
File "/home/chedelp/miniconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/chedelp/miniconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/chedelp/miniconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/chedelp/miniconda3/envs/stylegan3/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/ibex/ai/home/chedelp/stylegan3/stylegan3/train.py", line 281, in main
launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
File "/ibex/ai/home/chedelp/stylegan3/stylegan3/train.py", line 98, in launch_training
torch.multiprocessing.spawn(fn=subprocess_fn, args=(c, temp_dir), nprocs=c.num_gpus)
File "/home/chedelp/miniconda3/envs/stylegan3/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/chedelp/miniconda3/envs/stylegan3/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/chedelp/miniconda3/envs/stylegan3/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 5 terminated with the following error:
Traceback (most recent call last):
File "/home/chedelp/miniconda3/envs/stylegan3/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/ibex/ai/home/chedelp/stylegan3/stylegan3/train.py", line 47, in subprocess_fn
training_loop.training_loop(rank=rank, **c)
File "/ibex/ai/home/chedelp/stylegan3/stylegan3/training/training_loop.py", line 188, in training_loop
torch.distributed.broadcast(param, src=0)
File "/home/chedelp/miniconda3/envs/stylegan3/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1090, in broadcast
work = default_pg.broadcast([tensor], opts)
RuntimeError: Timeout waiting for key: 0/0
I am training StyleGAN3 with my own data set.
Following is the training command line:
python train.py --outdir=../output/output_15sep2022/ --cfg=stylegan3-t --data=../windows/large-512x512.zip --gpus=8 --batch=32 --gamma=8.2 --mirror=1
I am getting the following error:
Help me to solve the error.