NVIDIA / pix2pixHD

Synthesizing and manipulating 2048x1024 images with conditional GANs
https://tcwang0509.github.io/pix2pixHD/
Other
6.65k stars 1.39k forks source link

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR when trying to train #176

Open graham-eisele opened 4 years ago

graham-eisele commented 4 years ago

Full Error: C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [416,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [417,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [418,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [419,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [420,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [421,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [422,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [423,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [424,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [425,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [426,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [427,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [428,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [429,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [430,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [431,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [432,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [433,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [434,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [435,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [436,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [437,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [438,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [439,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [440,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [441,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [442,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [443,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [444,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [445,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [446,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [447,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [192,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [193,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [194,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [195,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [196,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [197,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [198,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [199,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [200,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [201,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [202,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [203,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [204,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [205,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [206,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [207,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [208,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [209,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [210,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [211,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [212,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [213,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [214,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [215,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [216,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [217,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [218,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [219,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [220,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [221,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [222,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. C:/w/1/s/windows/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: block: [139,0,0], thread: [223,0,0] Assertion indexValue >= 0 && indexValue < tensor.sizes[dim] failed. Traceback (most recent call last): File "train.py", line 71, in Variable(data['image']), Variable(data['feat']), infer=save_fake) File "C:\Users\Graham\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(*input, kwargs) File "C:\Users\Graham\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward return self.module(*inputs[0], *kwargs[0]) File "C:\Users\Graham\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(input, kwargs) File "C:\Users\Graham\Documents\pix2pixHD-master\models\pix2pixHD_model.py", line 163, in forward fake_image = self.netG.forward(input_concat) File "C:\Users\Graham\Documents\pix2pixHD-master\models\networks.py", line 211, in forward return self.model(input) File "C:\Users\Graham\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(*input, *kwargs) File "C:\Users\Graham\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\container.py", line 92, in forward input = module(input) File "C:\Users\Graham\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 547, in call result = self.forward(input, **kwargs) File "C:\Users\Graham\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\conv.py", line 343, in forward return self.conv2d_forward(input, self.weight) File "C:\Users\Graham\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\conv.py", line 340, in conv2d_forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

using command: python train.py --no_instance --no_html --data_type 8 --resize_or_crop none --loadSize 512 --fineSize 512

OS: Windows 10 Cuda 10.2

tripzero commented 4 years ago

How much GPU memory do you have? I find this will often happen when too little GPU memory is available.

DavidCarlyn commented 4 years ago

I am also getting the same error, and I have a full 8 GB of GPU memory available

DavidCarlyn commented 4 years ago

I adjusted the channels (input, output, label) and I resolved my issue. Disregard my comment.

tripzero commented 4 years ago

What did you adjust them to?

On Thu, Apr 16, 2020 at 2:08 PM David Carlyn notifications@github.com wrote:

I adjusted the channels (input, output, label) and I resolved my issue. Disregard my comment.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/pix2pixHD/issues/176#issuecomment-614896914, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH4RATHC7P2IX7AVETWES3RM5XUDANCNFSM4KI4A6EQ .

DavidCarlyn commented 4 years ago

My goal is to transform a 3 channel image to a 1 channel image, and I am not using instances. So my input is 3 channels, output is 1 channel and instance is 0 channels

DavidCarlyn commented 4 years ago

Although the transforms don't like anything other than 3 channels for the input and output

ClementBarbisan commented 4 years ago

I had this error a few times and it was either, lack of GPU memory(I have 11Go, so the minimum require to run most of scripts) or uncompatibility version with cuda/cudnn/pytorch. I downgrade cuda to 10.0 and install version of cudnn accordingly. Hope it helps

La-fe commented 4 years ago

in pix2pixHD_model.py ,use torch.backends.cudnn.benchmark = False

erjel commented 3 years ago

I experienced the same problem while training from RGB image to RGB image. The CLI flag

--label_nc 0

solved the problem for me.

zoedsy commented 2 years ago

in pix2pixHD_model.py ,use torch.backends.cudnn.benchmark = False no use for my case though

zhangxiaojuan66 commented 2 years ago

你好,邮件已收到,祝你万事如意,生活愉快!