NVIDIA / pix2pixHD

Synthesizing and manipulating 2048x1024 images with conditional GANs
https://tcwang0509.github.io/pix2pixHD/
Other
6.67k stars 1.39k forks source link

Issue with training new data - CUDA error #152

Open attys opened 5 years ago

attys commented 5 years ago

I'm trying to train my own model, and I have a training dataset, for which I can successfully run the train.py code. But when I try to run the test.py portion for the test images, I get an error message. I've tried running this on several versions of pytorch, and none of the other solutions I've seen suggested on here have fixed it. Does anyone know what's going on?

Here's the code that triggers the failure:

python test.py --name sketch2ink --dataroot ./datasets/sketches/ --no_instance --netG local --ngf 32 --resize_or_crop none --how_many 1 --ntest 1

And here's the traceback:

Traceback (most recent call last): File "test.py", line 59, in generated = model.inference(data['label'], data['inst'], data['image']) File "/home/[redacted]/pix2pixHD/models/pix2pixHD_model.py", line 198, in inference input_label, inst_map, realimage, = self.encode_input(Variable(label), Variable(inst), image, infer=True) File "/home/[redacted]/pix2pixHD/models/pix2pixHD_model.py", line 132, in encode_input real_image = Variable(real_image.data.cuda())

RuntimeError: CUDA error: device-side assert triggered

skabbit commented 4 years ago

I got the same error on both my trained model and label2city_1024p model. Turning CUDA_LAUNCH_BLOCKING=1 gives more info: RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at c:\a\w\1\s\tmp_conda_3.6_090826\conda\conda-bld\pytorch_1550394668685\wor k\aten\src\thc\generic/THCTensorScatterGather.cu:380