Open msdisme opened 3 weeks ago
The mi100s support both PCIe 3.0 and 4.0. Question for AMD (Will @hpdempsey relay this?): "will we lose performance if we run this card off a PCIe 3.0 bus"?
We don't have anything that can run this on a 4.0 bus. If it's okay to stick to a 3.0 bus my recommendation is to:
If we'd prefer to run these at 4.0 speeds, we'd need to buy something new. I recommend something along the lines of Dell R760 (the last quote I got for one of these was $12k each), which can run 3 GPUs simultaneously on 4.0 bus. Or we can check with FLAX what whitebox solutions they have, which is probably a lot cheaper.
For our purposes the 3.0 is fine. Can they handle 2 per system (300 watts peak for each card)?
@hpdempsey any reason not to double them up?
@hakasapl I am not 100% sure how many cards we are getting yet, but I think it is 4 or 8.
@msdisme I think there are 19 R740xds in that rack (each with 1 V100). I don't see any issues with doubling up on the GPUs. There are several air cooled racks with even higher GPU density in the facility already.
Are the v100s in OpenStack currently being used by anyone? (asking because of downtime impact on charging).
AMD is planning to initially provide mi100s. https://www.amd.com/en/products/accelerators/instinct/mi100.html