Junjue-Wang / Rank1-Ali-Tianchi-Real-World-Image-Forgery-Localization-Challenge

2022阿里天池真实场景篡改图像检测挑战赛-冠军方案(1/1149)
175 stars 29 forks source link

使用temper中的config,更换为自己的数据集,报错RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error' #13

Closed CongYep closed 11 months ago

CongYep commented 12 months ago

使用temper中的config,使用命令bash tools/dist_train.sh work_configs/tamper/tamper_convx_b_exp.py 2,更换为自己的数据集(nist16,casia等),报错RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error',请问如何解决

Traceback (most recent call last): File "tools/train.py", line 181, in main() File "tools/train.py", line 177, in main meta=meta) File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/apis/train.py", line 135, in train_segmentor runner.run(data_loaders, cfg.workflow) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter kwargs) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step output = self.module.train_step(inputs[0], kwargs[0]) File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 138, in train_step losses = self(data_batch) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 128, in new_func output = old_func(new_args, new_kwargs) File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/base.py", line 108, in forward return self.forward_train(img, img_metas, kwargs) File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 144, in forward_train gt_semantic_seg) File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/segmentors/encoder_decoder.py", line 88, in _decode_head_forward_train self.train_cfg) File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 207, in forward_train losses = self.losses(seg_logits, gt_semantic_seg) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 214, in new_func output = old_func(new_args, *new_kwargs) File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/decode_heads/decode_head.py", line 259, in losses ignore_index=self.ignore_index) File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 308, in forward kwargs) File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 219, in lovasz_softmax flatten_probs(probs, labels, ignore_index), File "/home/cong/my_project/mmsegmentation-tianchi_tamper/mmseg/models/losses/lovasz_loss.py", line 55, in flatten_probs vprobs = probs[valid.nonzero().squeeze()] RuntimeError: CUDA error: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error' what(): CUDA error: an illegal memory access was encountered Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:687 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f9b166518b2 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void) + 0xad2 (0x7f9b16a18952 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f9b1663cb7d in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so) frame #3: + 0x5ff66a (0x7f9baadff66a in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #4: + 0x5ff716 (0x7f9baadff716 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so) frame #5: /root/anaconda3/envs/open-mmlab/bin/python() [0x4cb472] frame #6: /root/anaconda3/envs/open-mmlab/bin/python() [0x4a0a87] frame #7: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb] frame #8: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b5cfb] frame #9: /root/anaconda3/envs/open-mmlab/bin/python() [0x4b0858] frame #10: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b50] frame #11: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66] frame #12: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66] frame #13: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66] frame #14: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66] frame #15: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66] frame #16: /root/anaconda3/envs/open-mmlab/bin/python() [0x4c5b66] frame #17: /root/anaconda3/envs/open-mmlab/bin/python() [0x4946f7] frame #18: PyDict_SetItemString + 0x61 (0x499261 in /root/anaconda3/envs/open-mmlab/bin/python) frame #19: PyImport_Cleanup + 0x89 (0x56f719 in /root/anaconda3/envs/open-mmlab/bin/python) frame #20: Py_FinalizeEx + 0x67 (0x56b1a7 in /root/anaconda3/envs/open-mmlab/bin/python) frame #21: /root/anaconda3/envs/open-mmlab/bin/python() [0x53fc79] frame #22: _Py_UnixMain + 0x3c (0x53fb3c in /root/anaconda3/envs/open-mmlab/bin/python) frame #23: + 0x29d90 (0x7f9bb37e5d90 in /usr/lib/x86_64-linux-gnu/libc.so.6) frame #24: __libc_start_main + 0x80 (0x7f9bb37e5e40 in /usr/lib/x86_64-linux-gnu/libc.so.6) frame #25: /root/anaconda3/envs/open-mmlab/bin/python() [0x53f9ee]