使用temper中的config,更换为自己的数据集,报错RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error' #13
使用temper中的config,使用命令bash tools/dist_train.sh work_configs/tamper/tamper_convx_b_exp.py 2,更换为自己的数据集(nist16,casia等),报错RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error',请问如何解决
Traceback (most recent call last):
File "tools/train.py", line 181, in
main()
File "tools/train.py", line 177, in main
meta=meta)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/apis/train.py", line 135, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(inputs[0], kwargs[0])
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 138, in train_step
losses = self(data_batch)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(input, kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
output = old_func(new_args, new_kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 108, in forward
return self.forward_train(img, img_metas, kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 144, in forward_train
gt_semantic_seg)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 88, in _decode_head_forward_train
self.train_cfg)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 207, in forward_train
losses = self.losses(seg_logits, gt_semantic_seg)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
output = old_func(new_args, *new_kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 259, in losses
ignore_index=self.ignore_index)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(input, kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 308, in forward
kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 219, in lovasz_softmax
flatten_probs(probs, labels, ignore_index),
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 55, in flatten_probs
vprobs = probs[valid.nonzero().squeeze()]
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9b166518b2 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0xad2 (0x7f9b16a18952 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f9b1663cb7d in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x5ff66a (0x7f9baadff66a in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5ff716 (0x7f9baadff716 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: /root/anaconda3/envs/open-mmlab/bin/python() [0x4cb472]
frame #6: /root/anaconda3/envs/open-mmlab/bin/python() [0x4a0a87]
frame #7: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb]
frame #8: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb]
frame #9: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b0858]
frame #10: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b50]
frame #11: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #12: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #13: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #14: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #15: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #16: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #17: /root/anaconda3/envs/open-mmlab/bin/python() [0x4946f7]
frame #18: PyDict_SetItemString + 0x61 (0x499261 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyImport_Cleanup + 0x89 (0x56f719 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #20: Py_FinalizeEx + 0x67 (0x56b1a7 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #21: /root/anaconda3/envs/open-mmlab/bin/python() [0x53fc79]
frame #22: _Py_UnixMain + 0x3c (0x53fb3c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #23: + 0x29d90 (0x7f9bb37e5d90 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #24: __libc_start_main + 0x80 (0x7f9bb37e5e40 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #25: /root/anaconda3/envs/open-mmlab/bin/python() [0x53f9ee]
使用temper中的config,使用命令bash tools/dist_train.sh work_configs/tamper/tamper_convx_b_exp.py 2,更换为自己的数据集(nist16,casia等),报错RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error',请问如何解决
Traceback (most recent call last): File "tools/train.py", line 181, in
main()
File "tools/train.py", line 177, in main
meta=meta)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/apis/train.py", line 135, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(inputs[0], kwargs[0])
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 138, in train_step
losses = self(data_batch)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(input, kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func
output = old_func(new_args, new_kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 108, in forward
return self.forward_train(img, img_metas, kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 144, in forward_train
gt_semantic_seg)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 88, in _decode_head_forward_train
self.train_cfg)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 207, in forward_train
losses = self.losses(seg_logits, gt_semantic_seg)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func
output = old_func(new_args, *new_kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 259, in losses
ignore_index=self.ignore_index)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(input, kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 308, in forward
kwargs)
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 219, in lovasz_softmax
flatten_probs(probs, labels, ignore_index),
File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 55, in flatten_probs
vprobs = probs[valid.nonzero().squeeze()]
RuntimeError: CUDA error: an illegal memory access was encountered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: an illegal memory access was encountered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9b166518b2 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0xad2 (0x7f9b16a18952 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f9b1663cb7d in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #3: + 0x5ff66a (0x7f9baadff66a in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5ff716 (0x7f9baadff716 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: /root/anaconda3/envs/open-mmlab/bin/python() [0x4cb472]
frame #6: /root/anaconda3/envs/open-mmlab/bin/python() [0x4a0a87]
frame #7: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb]
frame #8: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb]
frame #9: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b0858]
frame #10: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b50]
frame #11: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #12: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #13: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #14: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #15: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #16: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66]
frame #17: /root/anaconda3/envs/open-mmlab/bin/python() [0x4946f7]
frame #18: PyDict_SetItemString + 0x61 (0x499261 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyImport_Cleanup + 0x89 (0x56f719 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #20: Py_FinalizeEx + 0x67 (0x56b1a7 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #21: /root/anaconda3/envs/open-mmlab/bin/python() [0x53fc79]
frame #22: _Py_UnixMain + 0x3c (0x53fb3c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #23: + 0x29d90 (0x7f9bb37e5d90 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #24: __libc_start_main + 0x80 (0x7f9bb37e5e40 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #25: /root/anaconda3/envs/open-mmlab/bin/python() [0x53f9ee]