Simon-Harris-IBM / ChallengeWorkflowTemplatesSimon

0 stars 1 forks source link

Assign submisisons to specific GPUs #18

Open Simon-Harris-IBM opened 4 years ago

Simon-Harris-IBM commented 4 years ago

Incoming submissions need to be assigned to specific GPUs - which cannot be shared. Setting "gpus=all" on multiple submissions running at the same time causes at least one of the submissions to hang.

thomasyu888 commented 4 years ago

Are you running multiple submissions on one instance? This will become a tiny bit tricky as we don't have a good mechanism of tracking the GPUs in use.

I do have someone working on a prototype of sorts to create a lock file per gpu in use, but it's not an easy solution as it requires asynchronous updates.

Simon-Harris-IBM commented 4 years ago

We had intended to use 16 core, 2x GPU machines -- running 2 concurrent submissions per machine. But I'm thinking if this is going to be tricky that we should use 8 core, 1x GPU machines, and just run 1 submission per machine.