loss.accumulate_gradient() not working when trying to use augmentation.

hamediut commented 1 year ago

Describe the bug I'm trying to train StyleGAN2-ADA version by setting the cfg to StyleGan2. it works when I don't implement augmentation (e.g., augment_pipe = 0, and ada_target = None), however when I set augment_pipe = 0.2, and ada_target = 0.6, I get an error in loss.accumulate_gradient().

To reproduce I ran: train.py --outdir D:\Hamed\stylegan3-main\training-runs --cfg 'stylegan2' --data D:\Hamed\stylegan3-main\dataset_ffhq\dest_folder\ffhq_00000.zip --gpus 1 --batch 16 --gamma 0.8

I used only 1000 images and downsampled them into 256 by 256 pixels. I also didn't use labels. I would really appreciate it if you can help as I have been stuck on this for 2 days!

The full error is: Traceback (most recent call last): File "train.py", line 288, in main() # pylint: disable=no-value-for-parameter File "C:\Users\David\anaconda3\envs\Python3_8\lib\site-packages\click\core.py", line 1128, in call return self.main(args, kwargs) File "C:\Users\David\anaconda3\envs\Python3_8\lib\site-packages\click\core.py", line 1053, in main rv = self.invoke(ctx) File "C:\Users\David\anaconda3\envs\Python3_8\lib\site-packages\click\core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\David\anaconda3\envs\Python3_8\lib\site-packages\click\core.py", line 754, in invoke return __callback(args, kwargs) File "train.py", line 283, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "train.py", line 96, in launch_training subprocess_fn(rank=0, c=c, temp_dir=temp_dir) File "train.py", line 47, in subprocess_fn training_loop.training_loop(rank=rank, c) File "C:\Users\David\stylegan3-main\training\training_loop.py", line 278, in training_loop loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg) File "C:\Users\David\stylegan3-main\training\loss.py", line 90, in accumulate_gradients loss_Gmain.mean().mul(gain).backward() File "C:\Users\David\anaconda3\envs\Python3_8\lib\site-packages\torch_tensor.py", line 396, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "C:\Users\David\anaconda3\envs\Python3_8\lib\site-packages\torch\autograd__init__.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "C:\Users\David\anaconda3\envs\Python3_8\lib\site-packages\torch\autograd\function.py", line 253, in apply return user_fn(self, *args) File "C:\Users\David\stylegan3-main\torch_utils\ops\grid_sample_gradfix.py", line 52, in backward grad_input, grad_grid = _GridSample2dBackward.apply(grad_output, input, grid) File "C:\Users\David\stylegan3-main\torch_utils\ops\grid_sample_gradfix.py", line 63, in forward grad_input, grad_grid = op(grad_output, input, grid, 0, 0, False, output_mask) TypeError: 'tuple' object is not callable

I am using:

OS: Windows 10
PyTorch 1.12.1
cudatoolkit 11.6.0
CUDA Version: 11.6 Driver Version: 511.09
GPU NVIDIA RTX A6000

PDillis commented 1 year ago

As a follow up question, what do you mean by [augment_pipe=0.2]? The augmentation pipeline expects what sort of augmentations to use. By default, --augpipe=bgc, as seen here. That is, it uses all of the blit, geom, and color augmentations at the same time. My question is where are you setting this 0.2 value, or is there another setting you are using?

On another note, you can shut off all augmentations by setting --aug=noaug, instead of changing values inside the code.

Note that the error could also be because of your PyTorch version, so I might suggest downgrading to 1.10, or another fix could be seen in #188. Hope this helps!

hamediut commented 1 year ago

Oh sorry I mean 'augment_p' which is --p in command line (aymentation probability). https://github.com/NVlabs/stylegan3/blob/407db86e6fe432540a22515310188288687858fa/training/training_loop.py#L176 As far as I understood the code no augmentation is executed when augment_p =0 and ada_target= None. And I want to run Ada augmentation since I have limited number of images.

Thanks for referring to the conversation. I hope I can find another way than downgrading pytorch version.

GiserLD commented 1 year ago

I met the same problem with you when running styleggan3, and I think it is the pytorch version. I would like to ask if you have solved this problem without demoting pytorch version. Thank you very much

jannehellsten commented 1 year ago

Should be fixed by https://github.com/NVlabs/stylegan3/commit/c233a919a6faee6e36a316ddd4eddababad1adf9. Sorry for the inconvenience.

NVlabs / stylegan3

loss.accumulate_gradient() not working when trying to use augmentation. #193