autonomousvision / stylegan-xl

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
MIT License
961 stars 113 forks source link

Fix for grid_sample_gradfix and conv2d_gradfix on pytorch 1.11 #117

Open vyabor opened 4 months ago

vyabor commented 4 months ago

I was receiving the below error when training which seems to be a result of a backwards-incompatible change in PyTorch 1.11.0, as pointed out in PyTorch issue #75018 regarding StyleGAN3.

Traceback (most recent call last):
  File "/home/ubuntu/stylegan-xl/train.py", line 336, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ubuntu/stylegan-xl/train.py", line 321, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "/home/ubuntu/stylegan-xl/train.py", line 104, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "/home/ubuntu/stylegan-xl/train.py", line 49, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/home/ubuntu/stylegan-xl/training/training_loop.py", line 339, in training_loop
    loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
  File "/home/ubuntu/stylegan-xl/training/loss.py", line 121, in accumulate_gradients
    loss_Gmain.backward()
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/_tensor.py", line 522, in backward
    torch.autograd.backward(
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/__init__.py", line 266, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/function.py", line 289, in apply
    return user_fn(self, *args)
  File "/home/ubuntu/stylegan-xl/torch_utils/ops/conv2d_gradfix.py", line 144, in backward
    grad_weight = Conv2dGradWeight.apply(grad_output, input)
  File "/home/ubuntu/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ubuntu/stylegan-xl/torch_utils/ops/conv2d_gradfix.py", line 173, in forward
    return torch._C._jit_get_operation(name)(weight_shape, grad_output, input, padding, stride, dilation, groups, *flags)
TypeError: 'tuple' object is not callable

@jannehellsten pushed a change to StyleGAN3 to fix this issue according to their comment on the aforementioned PyTorch issue.

After applying these same changes locally to conv2d_gradfix.py and grid_sample_gradfix.py in stylegan-xl, I can confirm that the model is training smoothly on my custom dataset.