agrimgupta92 / sgan

Code for "Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks", Gupta et al, CVPR 2018
MIT License
813 stars 261 forks source link

Training error: the GPU program failed #37

Closed saruvora closed 5 years ago

saruvora commented 5 years ago

Hi I am training the SGAN model with all the datasets provided . After a few iterations I face the following error:

Traceback (most recent call last): File "/workspace/code/scripts/train.py", line 512, in <module> main(args) File "/workspace/code/scripts/train.py", line 191, in main optimizer_g) File "/workspace/code/scripts/train.py", line 387, in generator_step loss.backward() File "/opt/conda/lib/python3.6/site-packages/torch/tensor.py", line 96, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/__init__.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/pytorch/pytorch/aten/src/THC/THCBlas.cu:258

It would be great if someone can help me fix this error. This is with num_epochs= 200 and when I tried it with num_epochs = 5 it works fine

saruvora commented 5 years ago

I trained the model again and it worked. Is still do not why