Open alfredox10 opened 7 years ago
What happens when you try to use only GPU 1? The GPU selection is per-thread, do you have other threads running in your app?
It's important to set_device()
before set_mode_gpu()
like so:
caffe.set_device(gpu_slot)
caffe.set_mode_gpu()
rcnn_net = caffe.Net(prototxt, caffemodel, caffe.TEST)
Why the order of two setting statements matters? @shelhamer @cypof
Full error:
F0407 22:35:23.664752 27364 syncedmem.hpp:22] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
Issue summary
This happens when I'm trying to run image detection using trained rcnn models on a python script that splits a stream of images into multiple python sub-processes and loads models for each GPU under each child process. I always see the memory go up on GPU 0, but not on the other 8 GPUs available in the system. I am trying to implement parallel GPU detection by splitting the task to 8 GPUs on a p2.8xlarge AWS ec2 instance.
Has anyone seen this? I know caffe isn't optimized for multi-GPU training but I did not think there would be any issues if I split up the processes independently and just ran detections on each GPU?
I am using this command in python to set each GPU in each subprocess: caffe.set_mode_gpu() caffe.set_device(gpu_slot) rcnn_net = caffe.Net(prototxt, caffemodel, caffe.TEST)
Is there something else I should set? Does the caffe library have hard-coded to only use shared memory on GPU 0 for all GPUs? Any information would be helpful.
Steps to reproduce
Run Caffe in multiple terminal windows (easier than writing a multiprocess python application) each assigned to a different GPU, and then attempt to perform detections in parallel as normally done in caffe API with these commands
net.set_input_arrays(data4D.astype(np.float32), data4DLabels.astype(np.float32)) prediction = net.forward()
System configuration
Operating system: ubuntu headless 14.04 Compiler: gcc 4.7 CUDA version (if applicable): 7.5 CUDNN version (if applicable): 4 Python or MATLAB version (for pycaffe and matcaffe respectively): 2.7