cBio / cbio-cluster

MSKCC cBio cluster documentation
12 stars 2 forks source link

Install spare GTX-680s in new sbio nodes? #408

Closed jchodera closed 8 years ago

jchodera commented 8 years ago

I understand the new sbio nodes are GPU-ready but have no GPUs yet, and there is no timeline for procuring additional GPUs.

Since we have 12-16 GTX-680s removed from the recent project that upgraded some nodes to more recent GPUs (e.g. #307), why don't we put those in the GPU-ready nodes for now? That would increase our GPU capacity by ~10% for free.

(This was already mentioned to @juanperin directly, but logging it here for input from others.)

lzamparo commented 8 years ago

What about the GTX-Titan we got as part of the Nvidia centre for gpu research grant? Has that been delivered & incorporated into the GPU queue?

juanperin commented 8 years ago

The card we got was a k40 tesla. It's been installed but not added in yet. We will make that available next week.

Juan

On May 5, 2016, at 5:30 PM, Lee Zamparo notifications@github.com wrote:

What about the GTX-Titan we got as part of the Nvidia centre for gpu research grant? Has that been delivered & incorporated into the GPU queue?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

jchodera commented 8 years ago

The card we got was a k40 tesla. It's been installed but not added in yet. We will make that available next week.

That is a passively-cooled card and may not work in the Exxact boxes. Definitely check with Exxact first--you could potentially void some sort of service contract by putting it in a box.

If they give you the thumbs-up, we actually have a K10 and K20 sitting around that aren't being used that we could give you too.

jchodera commented 8 years ago

Correction: We have a K20 and K20X you can have.

J

tatarsky commented 8 years ago

To be clear the card is a K40C which if I recall correctly has a fan.

We did ask Exxact about it.

But I will make sure they fully understood the question as it was combined with our checking that the power leads were what we expected. They didn't seem to have an issue with it but again, I will email Glenmar.

Its alone in a test node so if we need to perform some checks we can do so.

I show at least the Fan saying its spinning at 23% whatever that really means based on I assume some maximum RPM.

+------------------------------------------------------+                       
| NVIDIA-SMI 352.39     Driver Version: 352.39         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K40c          Off  | 0000:81:00.0     Off |                    0 |
| 23%   40C    P0    66W / 235W |     22MiB / 11519MiB |     72%      Default |
+-------------------------------+----------------------+----------------------+

This link seems to support that being an actual thing:

https://devtalk.nvidia.com/default/topic/789809/difference-between-tesla-k40c-k40m-k40s/

"K40c is a very similar product but includes an active fansink. The card is basically responsible for cooling itself, and so can be plugged into a wider variety of platforms, e.g. workstations."

We can also have one of you "heat it on up" and we can monitor the IPMI thermal sensors of the G3.

jchodera commented 8 years ago

Thanks! I hadn't realized there was a model with a fan!

tatarsky commented 8 years ago

I had not seen such an animal myself either. But I guess Nvidia figures most of these go into workstations and don't want to be responsible for burning them up ;)

We will double check regardless. Better safe than sorry.

tatarsky commented 8 years ago

(Aka this giveaway program goes into workstations...not the whole Tesla line)

jchodera commented 8 years ago

Any updates here?

jchodera commented 8 years ago

Though, now that the GTX-1080s have dropped and are insanely cheap ($599) for 9 TFLOP/s, it may make much more sense to skip this altogether and fill those sbio nodes with GTX-1080s. If there are 40 and they can take 4 GPUs/each, that's less than $100K to add 1.5 PFLOP/s of theoretical peak computing power.

tatarsky commented 8 years ago

I am waiting for final decisions on how to allow sbio system use with regular batch use.

tatarsky commented 8 years ago

I have no particular updates on this. I'm going to suggest this be opened as a email ticket and assigned for discussion at a level higher than myself.

jchodera commented 8 years ago

Done.