Scale checks are done regularly, to validate that running instances
number matches expected number of instances in runSpec.
There are two cases:
There are missing instances, in that case marathon starts new
instances
There are too many instances, in that case marathon kills overdue
instances
In second case, lock is supposed to be released only when overdue instances
are dead. We encountered issues where lock was never released because
KillStreamWatKillStreamWatcher.watchForKilledTasks() future was never
ending. This is because of a typo in the code, making it wait for ALL
instances to die, instead of just the overdue subset.
Scale checks are done regularly, to validate that running instances number matches expected number of instances in runSpec.
There are two cases:
In second case, lock is supposed to be released only when overdue instances are dead. We encountered issues where lock was never released because KillStreamWatKillStreamWatcher.watchForKilledTasks() future was never ending. This is because of a typo in the code, making it wait for ALL instances to die, instead of just the overdue subset.
What do you think?