I've tried to follow Using DIGITS to train an Object Detection network. When I clicked the create button for training DetectNet with DIGITS with multi-GPU (i.e. 2, or 3 or 4 GPUs), job status is "waiting" forever. When I chose only single GPU, job status is "running" but estimated time is about 20 hours. Dataset is KITTI object dataset as mentioned in the tutorial. I installed nv-deep-learning-repo-ubuntu1404-ga-cuda8.0-cudnn5.1.10_1-1_amd64.deb. DIGITS version is 4.0.0 and Caffe version 0.5.13. Any comment would be appreciated.
Probably you need to check your system bios whether it has IOMMU enabled or disabled.
I faced a similar issue and now my Multi Gpu training is working when i disabled IOMMU
Hello,
I've tried to follow Using DIGITS to train an Object Detection network. When I clicked the create button for training DetectNet with DIGITS with multi-GPU (i.e. 2, or 3 or 4 GPUs), job status is "waiting" forever. When I chose only single GPU, job status is "running" but estimated time is about 20 hours. Dataset is KITTI object dataset as mentioned in the tutorial. I installed nv-deep-learning-repo-ubuntu1404-ga-cuda8.0-cudnn5.1.10_1-1_amd64.deb. DIGITS version is 4.0.0 and Caffe version 0.5.13. Any comment would be appreciated.