Open dsschult opened 7 years ago
I believe there are many possibilities for how to do this matching. One option I could see is, start assuming that our higher value resource are gpus, then for the example above the pilot would know that the slot is managing has 8 GPUs with an average of 5GB memory per GPU.
In the algorithm we use for the matching, I think the pilot should be reluctant to carve slots with much more memory than the average. The first carved slot could be allowed go above average, but not the rest. Pretty much what you have displayed in your example above.
the tricky part I see here is that probably we want to have some capability for "configuring" this partitioning policy... if we want to ensure for instance that high mem jobs do get executed, because we decide they are important... or the opposite ...
with this "partitionable iceprod pilot" route you have chosen to take the matchmaking job of the negotiatior... tough job! :-)
Yeah. I didn't really want to do the matchmaking job in iceprod. But I need to get around the limitations of condor, which led to partitionable pilots.
I'm currently thinking of implementing a variation on fair queueing, like the Linux scheduler uses. This is probably the easiest general way to solve the problem.
Whiteboard solution:
multi-level queues:
scheduling priority within each queue (lowest priority wins):
priority = max(0, req_cpu + req_gpu + req_memory + req_disk + req_time - wait_sec//600)
where wait_sec
is the time in seconds since a task was queued
So, when a new resource is available, choose tasks from queues like:
This seems to perform "better", though still not completely solved. Memory is still a limiting factor, especially memory fragmentation.
We're leaving a lot of resources idle right now because of packing inefficiency. In the GPU case, where it matters most, we have too many high memory jobs.
Example:
We just wasted 6 gpus.
A better packing might be:
This packing uses all gpus and memory - much more efficient. Though the 25GB task doesn't run then, so there are some problems with it.
One thing that may help significantly is if we can allocate multiple tasks at once, instead of one by one: #237. This would allow proper packing of a single pilot, though packing over multiple pilots and over time would still need work.