NVlabs / stylegan3

Official PyTorch implementation of StyleGAN3
Other
6.28k stars 1.1k forks source link

Training gg colab: RuntimeError: No such operator aten::cudnn_convolution_backward_weight #631

Open minhduc01168 opened 6 months ago

minhduc01168 commented 6 months ago

2023-12-14 09:24:39.480344: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-12-14 09:24:39.480399: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-12-14 09:24:39.480449: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-12-14 09:24:41.346063: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Training for 500 kimg...

Traceback (most recent call last): File "/content/drive/MyDrive/WIP/stylegan3/train.py", line 286, in main() # pylint: disable=no-value-for-parameter File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/content/drive/MyDrive/WIP/stylegan3/train.py", line 281, in main launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run) File "/content/drive/MyDrive/WIP/stylegan3/train.py", line 96, in launch_training subprocess_fn(rank=0, c=c, temp_dir=temp_dir) File "/content/drive/MyDrive/WIP/stylegan3/train.py", line 47, in subprocess_fn training_loop.training_loop(rank=rank, c) File "/content/drive/MyDrive/WIP/stylegan3/training/training_loop.py", line 278, in training_loop loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg) File "/content/drive/MyDrive/WIP/stylegan3/training/loss.py", line 111, in accumulate_gradients loss_Dgen.mean().mul(gain).backward() File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 288, in apply return user_fn(self, args) File "/content/drive/MyDrive/WIP/stylegan3/torch_utils/ops/conv2d_gradfix.py", line 144, in backward grad_weight = Conv2dGradWeight.apply(grad_output, input) File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 539, in apply return super().apply(args, *kwargs) # type: ignore[misc] File "/content/drive/MyDrive/WIP/stylegan3/torch_utils/ops/conv2d_gradfix.py", line 173, in forward return torch._C._jit_get_operation(name)(weight_shape, grad_output, input, padding, stride, dilation, groups, flags) RuntimeError: No such operator aten::cudnn_convolution_backward_weight Google Colab: T4 GPU !nvidia-smi: CUDA 12.0 pytorch 2.1.0+cu118 I ran the following command for training: # Fine-tune StyleGAN3-R for MetFaces-U using 1 GPU, starting from the pre-trained FFHQ-U pickle. !python /content/drive/MyDrive/WIP/stylegan3/train.py --outdir=~/training-runs --cfg=stylegan3-r --data=/content/drive/MyDrive/WIP/stylegan3/datasets/Face_Celeb-1024x1024.zip \ --gpus=1 --batch=16 --batch-gpu=8 --gamma=6.6 --mirror=1 --kimg=500 --snap=5 \ --resume=https://api.ngc.nvidia.com/v2/models/nvidia/research/stylegan3/versions/1/files/stylegan3-r-ffhqu-1024x1024.pkl Hoping to get some help. I've been searching but still haven't found a solution