Open okofish opened 3 years ago
I found a section in the documention that advises against setting requiredSlots
> taskSlotsPerNode
and gives a rationale for not enforcing this at submission time:
Be sure you don't specify a task's
requiredSlots
to be greater than the pool'staskSlotsPerNode
. This will result in the task never being able to run. The Batch Service doesn't currently validate this conflict when you submit tasks because a job may not have a pool bound at submission time, or it could be changed to a different pool by disabling/re-enabling.
This rationale is sensible; I had not considered that a task could be reassigned to a different pool. However, barring some other solution I haven't considered, I believe the current behavior's conflict with autoscaling is severe enough to necessitate some way of preventing the state where Batch scales up a node that remains idle forever.
Two possible ideas:
Task_Add
and Task_AddCollection
operations that, when enabled, causes the operation to fail if requiredSlots
is greater than taskSlotsPerNode
. I would opine that this flag should be enabled by default, but either way works.$PendingTasks
, $ActiveTasks
, etc.) could completely ignore tasks where requiredSlots
is greater than taskSlotsPerNode
. I think this seems like a rather clunky solution, but it would solve the main problem with autoscaling.
Problem Description
It's possible to submit a task that requires more task slots than the number of task slots per node configured at the pool level. If the pool uses an autoscale formula based on the number of pending tasks, the pool will scale up and remain scaled up indefinitely, without scheduling the task to a node.
Steps to Reproduce
Standard_D4_v3
VM size.requiredSlots
) to 8.Expected Results
The pool does not scale up (because doing so would be pointless; a 8-slot task cannot be scheduled onto a 4-slot node.)
Actual Results
The pool scales up to 1 node, which remains idle indefinitely.
Additional Comments
This behavior can also be demonstrated using the autoscaling formula simulation API, but it's more fun to try it for real and watch the pool burn money before your eyes 😄
I think the core issue here isn't actually with autoscaling, but that seems to be where the behavior is most visibly problematic. Since the
taskSlotsPerNode
pool setting is immutable, a (non-multi-instance) task withrequiredSlots
greater than its pool'staskSlotsPerNode
is unschedulable and will never be schedulable. I think theTask_Add
andTask_AddCollection
operations should simply fail ifrequiredSlots
is >taskSlotsPerNode
. As mentioned above, the Batch portal already recognizes this implicit constraint and makes it explicit.