cBio / cbio-cluster

MSKCC cBio cluster documentation
12 stars 2 forks source link

GPU queue attempt to add four extra thread-slots per node #399

Closed tatarsky closed 8 years ago

tatarsky commented 8 years ago

Per a discussion in another topic there is a desire from the original config spec to perform the following.

I seem to recall that we allowed four overcommitted thread-slots per node for the gpu queue. I am not entirely sure if this was ever correctly implemented by SDSC, but it was definitely in our spec sheet for the configuration.

I do not see signs of said configuration but am determining the best way to do so.

jchodera commented 8 years ago

Cool!

Our motivation was that the GPU codes typically spin-lock a hyperthread but do not appear to consume resources that could be used by other CPU jobs. We extensively benchmarked this some time ago, and found that the impact was minimal.

jchodera commented 8 years ago

It looks like there would be a lot of advantage in this. Lots of GPUs are not being utilized, but a number of jobs are currently waiting in the gpu queue simply because there are no free thread-slots on the nodes to let them be scheduled.

tatarsky commented 8 years ago

I have not come up with a clean way to do this yet using the existing config. I will attempt to do so next week.

tatarsky commented 8 years ago

I'm trying a few items this morning to see if this is possible without more elaborate changes which is my current belief.

tatarsky commented 8 years ago

I have attempted to provide on just gpu-3-9 for this morning four extra slots for JUST gpu queue jobs but I'm trying to determine if what I've done actually works and I believe I need to wait for a users 4 gpu job to end to see. I would do more but I'm really not overly comfortable with the adjustment of these settings on the live cluster. We really should have a place to test changes.

jchodera commented 8 years ago

We really should have a place to test changes.

Maybe worth raising an issue on the internal issue tracker if @juanperin is still using that?

tatarsky commented 8 years ago

It may be working as I expect on gpu-3-9. I'm watching a combination of a 12 slot gpu job which has four gpus on gpu-3-9 and some batch runs that total 32 on that node. I need to however review a couple of settings to make sure this is doing what I believe (before I add the oversubscribe to other nodes)

tatarsky commented 8 years ago

I am watching for a node that has 32 batch slots in use but no gpu slots in use. I'm waiting on some jobs that are 32 hours into their 48 hour walltime in gpu queue that when they exit I am going to oversubscribe hopefully a node that meets that criteria and see if queued gpu jobs go to it. Hopefully that sentence makes some sense.

jchodera commented 8 years ago

Thanks so much!

tatarsky commented 8 years ago

The condition I was looking for (all 32 slots batch, no gpu) just happened on gpu-2-17. I bumped the slot count but the job queued I wanted to see go there for reasons still unclear to me did not.

7132046                5001481      1.3 pr   (removed) (removed)      4    20:10:00       gpu   Tue May  3 04:16:08

However my single gpu job did while 32 of the batch slots were in use:

qsub -I -l gpus=1:nodes=1:ppn=4:mem=1gb -q gpu
qsub: waiting for job 7132047.hal-sched1.local to start
qsub: job 7132047.hal-sched1.local ready

[gpu-2-17 ~]$ 

So also did four gpu job. And none of the queued batch jobs went there implying to me I still have batch limited correctly to 32 slots per node.

checkjob and the logs seem to imply some additional policy was preventing 7132046 from being scheduled. So I may have this right but I'm still looking at it.

I will likely add a few more nodes in this "oversubscribed for gpu only" mode and continue to watch.

jchodera commented 8 years ago

Woohoo! Sounds like great progress!

tatarsky commented 8 years ago

Your enthusiasm is appreciated. Basically I'm waiting on nodes that all have a 48 hour 4 gpu each job from a user in your group. Those jobs are close to that wall time so I hope to get a full confirmation I have this correct shortly.

jchodera commented 8 years ago

Sorry about that. Mehtap has a rapidly approaching deadline and managed to claim over 100 GPUs before anyone noticed!

tatarsky commented 8 years ago

Ok, so a batch of his jobs started to exit and while there isn't the current batch queue based load I am seeing what I believe is correct behavior. I continue to add nodes to the oversubscribe config slowly just being careful and observing. Another member of your group just got a 24 gpu job scheduled as his 12 gpu jobs are wrapping up. I will continue down this path until all nodes are configured for it and announce when done.

jchodera commented 8 years ago

Woot!

tatarsky commented 8 years ago

All GPU containing nodes in batch are now in this configuration.