BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.04k stars 18.7k forks source link

4 GPUs on Dell Server #4005

Closed smithdir101 closed 8 years ago

smithdir101 commented 8 years ago

Hi All,

We open this issue in order to share our experience in utilizing 4 GPUs on Dell server.

Jhon

smithdir101 commented 8 years ago

The Dell server model we use is PowerEdge T630. 8 drives bay, GPU support module, 2x 1600W power supply, 2xIntel Xeons 3Ghz, 128Gb RAM, Windows Server 2012 R2. Initially we purchased 2x Nvidia K20 but discovered very quickly card's perfomance is not enough and moved to use 4xNvidia Titan X (not offcially supported, works smooth though). Unfortunatelly we were not able to use the 4th card for some reason. It is discovered and activated by operating system but when you ask caffe to use it, caffe is going to crash. We started to investigate and here are results.

  1. The 4th card is not visible by CUDA examples - therefore CUDA does not see this card -> caffe does not know about this card and will crash if you ask to use it.
  2. GPU-Z or HwINFO show something wrong with PCI bus speed of this card and CUDA feature is not available for this card.
  3. It is not about PCI bus slot. Take out the card from the first PCI slot and (magic!) cards in slots 2,3,4 will start to work. (actual numbers of PCI slots on motherboard are different, do not want to bother you with details)
  4. Dell server owner's manual has the section explainig about constraint that are in place if one wants to use 4 GPUs in this server: no additional PCI cards are allowed when 4 GPUs are connected. It means no Infiniband, no hardware raid (yes, onboard SATA only allowed), no USB3 card, nothing. This constraint is not mentioned neither on dell site server's page (general info, tech specs) nor in server's assembly wizard (yes, you can assemble not working configuration). The information is approved by Dell official support team. We asked them to add this info to the site because it is a critical piece of information one should have before purchasing.

The bottom line. On PowerEdge T630 only 3 GPUs are available with normal working configurations. If your server works on software raid and does not need any additional PCI cards - you can use 4 GPUs.

seanbell commented 8 years ago

Hi @smithdir101 I appreciate the writeup and it will be useful to those building servers, but this post is unrelated to caffe and thus I am closing it. Please continue the discussion on the mailing list.

From https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:

Please do not post usage, installation, or modeling questions, or other requests for help to Issues. Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.