Closed zhcf closed 6 years ago
terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at context_gpu.h:170] . Encountered CUDA error: invalid device function Error from operator: input: "gpu_0/res2_0_branch2c_bn" input: "gpu_0/res2_0_branch1_bn" output: "gpu_0/res2_0_branch2c_bn" name: "" type: "Sum" device_option { device_type: 1 cuda_gpu_id: 0 } Aborted at 1518191836 (unix time) try "date -d @1518191836" if you are using GNU date PC: @ 0x7f18119a01f7 GI_raise SIGABRT (@0x2c82) received by PID 11394 (TID 0x7f16caffd700) from PID 11394; stack trace: @ 0x7f18124465e0 (unknown) @ 0x7f18119a01f7 __GI_raise @ 0x7f18119a18e8 GI_abort @ 0x7f180af79ac5 (unknown) @ 0x7f180af77a36 (unknown) @ 0x7f180af77a63 (unknown) @ 0x7f180afce345 (unknown) @ 0x7f181243ee25 start_thread @ 0x7f1811a6334d __clone
cd /opt/project/detectron python2 tools/train_net.py \ --cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml \ OUTPUT_DIR work/output/coco_train_with_1gpu
Operating system: redhat7.3
Compiler version: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
CUDA version: 8.0, cuda_8.0.61_375.26_linux-run
cuDNN version: cudnn-8.0-linux-x64-v7
NVIDIA driver version: 375.26
GPU models (for all devices if they are not all the same): Fri Feb 9 11:11:55 2018 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN X (Pascal) Off | 0000:1B:00.0 Off | N/A | | 23% 22C P8 9W / 250W | 2MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN X (Pascal) Off | 0000:86:00.0 Off | N/A | | 32% 55C P2 105W / 250W | 347MiB / 12189MiB | 94% Default | +-------------------------------+----------------------+----------------------+
PYTHONPATH environment variable: /opt/project/detectron/lib:/opt/caffe2/build
PYTHONPATH
python --version output: Python 2.7.5
python --version
I see there are 2 or 3 posts are about this kind problem in recent days
clean caffe2 and rebuild, then it worked.
Closing since the issue seems resolved. Please reopen if that is not the case.
Actual results
terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at context_gpu.h:170] . Encountered CUDA error: invalid device function Error from operator: input: "gpu_0/res2_0_branch2c_bn" input: "gpu_0/res2_0_branch1_bn" output: "gpu_0/res2_0_branch2c_bn" name: "" type: "Sum" device_option { device_type: 1 cuda_gpu_id: 0 } Aborted at 1518191836 (unix time) try "date -d @1518191836" if you are using GNU date PC: @ 0x7f18119a01f7 GI_raise SIGABRT (@0x2c82) received by PID 11394 (TID 0x7f16caffd700) from PID 11394; stack trace: @ 0x7f18124465e0 (unknown) @ 0x7f18119a01f7 __GI_raise @ 0x7f18119a18e8 GI_abort @ 0x7f180af79ac5 (unknown) @ 0x7f180af77a36 (unknown) @ 0x7f180af77a63 (unknown) @ 0x7f180afce345 (unknown) @ 0x7f181243ee25 start_thread @ 0x7f1811a6334d __clone
Detailed steps to reproduce
cd /opt/project/detectron python2 tools/train_net.py \ --cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml \ OUTPUT_DIR work/output/coco_train_with_1gpu
System information
Operating system: redhat7.3
Compiler version: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
CUDA version: 8.0, cuda_8.0.61_375.26_linux-run
cuDNN version: cudnn-8.0-linux-x64-v7
NVIDIA driver version: 375.26
GPU models (for all devices if they are not all the same): Fri Feb 9 11:11:55 2018
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN X (Pascal) Off | 0000:1B:00.0 Off | N/A | | 23% 22C P8 9W / 250W | 2MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN X (Pascal) Off | 0000:86:00.0 Off | N/A | | 32% 55C P2 105W / 250W | 347MiB / 12189MiB | 94% Default | +-------------------------------+----------------------+----------------------+
PYTHONPATH
environment variable: /opt/project/detectron/lib:/opt/caffe2/buildpython --version
output: Python 2.7.5