Closed swapnilsayansaha closed 3 years ago
Can you please mention the Jupyter notebook you used for your Caffe experiment? Did you code any line for defining which GPUs to use?
I abandoned Caffe for TF/PyTorch. Caffe has too many bugs and poor documentation,
Issue summary.
I installed Caffe on a GPU machine, with CUDA 10.0 properly configured, Python: 3.7 (Anaconda3) and CUDNN 7.6.5. I successfully installed all dependencies of Caffe and tested the functionalities via runtest and pytest commands. The commands run without errors and Caffe seems to also use the GPU a bit during the tests, which is fine. I then try to run the Pascal VOC Jupyter notebook example that comes with Caffe (had to edit a couple of lines like print and image_resize for Python 3 compatibility). It successfully goes through all the sections and nvidia-smi shows the process occupying ~ 800 MB of memory in GPU. However, during the iteration steps, it seems Caffe is not using the GPU. The CPU usage goes to 100% on 16 out of 32 cores but the GPU usage remains at 2-3%. This makes training process extremely slow.
Steps to reproduce
Install Caffe on a GPU machine using guides listed in tried solutions section and run Pascal VOC example.
Tried solutions
I thought it might be a build problem with Caffe and I tried to rebuild Caffe using a variety of different guides for both Python 2 and Python 3 environments using reference from the following guides. I ran into a huge list of trouble given the difficult installation process of Caffe but managed to install Caffe properly all the time in the end. The issue, however, remained the same: Caffe occupies GPU memory but does not use it.
I checked if CUDA was properly configured as well. Tensorflow could readily use GPU cores without any issues.
System configuration
Issue checklist