A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiased Scene Graph Generation from Biased Training CVPR 2020”
MIT License
1.06k
stars
229
forks
source link
RuntimeError: CUDA error: device-side assert triggered #188
Happy Chinese New Year!
I tried to train this model with VG. I followed README to get start and met some problem with mix precision. So I use float32. When process went to 4812-th iteration with 12 batch size, this error occurred. Full content as follow:
Traceback (most recent call last):
File "/root/.vscode-server/extensions/ms-python.python-2021.2.633441544/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 3215, in <module>
File "/root/.vscode-server/extensions/ms-python.python-2021.2.633441544/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 3208, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/root/.vscode-server/extensions/ms-python.python-2021.2.633441544/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 2282, in run
return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
File "/root/.vscode-server/extensions/ms-python.python-2021.2.633441544/pythonFiles/lib/python/debugpy/_vendored/pydevd/pydevd.py", line 2289, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/root/.vscode-server/extensions/ms-python.python-2021.2.633441544/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydev_imps/_pydev_execfile.py", line 25, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "tools/relation_train_net.py", line 383, in <module>
main()
File "tools/relation_train_net.py", line 376, in main
model = train(cfg, args.local_rank, args.distributed, logger)
File "tools/relation_train_net.py", line 164, in train
scaled_losses.backward()
File "/root/miniconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/miniconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I noticed the NOTE in relation_train_net.py line 161, so I commented out:
with amp.scale_loss(losses, optimizer) as scaled_losses:
scaled_losses.backward()
and use
losses.backward()
It's not working... And error came to:
Traceback (most recent call last):
File "tools/relation_train_net.py", line 384, in <module>
main()
File "tools/relation_train_net.py", line 377, in main
model = train(cfg, args.local_rank, args.distributed, logger)
File "tools/relation_train_net.py", line 165, in train
losses.backward()
File "/root/miniconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/root/miniconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/autograd/__init__.py", line 154, in backward
Variable._execution_engine.run_backward(
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
❓ Questions and Help
Happy Chinese New Year! I tried to train this model with VG. I followed README to get start and met some problem with mix precision. So I use float32. When process went to 4812-th iteration with 12 batch size, this error occurred. Full content as follow:
I noticed the NOTE in
relation_train_net.py
line 161, so I commented out:and use
It's not working... And error came to:
Have anyone met this issue before?