For a framework based on Fenzo, what are the guidelines for scheduling service style tasks?
I am looking for a use case to schedule mix of service and batch jobs. The queueable task input to Fenzo has no distinction for service or batch jobs. This implies that framework should restart the service job when the job finishes/fails. One way is to push the failed/finished service job back in the pending queue, and wait for Fenzo to schedule the job. However, this may lead to interruption of service till the time the prior pending jobs in the queue gets scheduled.
Is there any recommendation to handle restart for the service style tasks and also minimize the interruption of the service?
Use Fenzo's tiered queues to define priority tiers for service versus batch tasks. We actually created two tiers - one for "critical" tasks that need to be launched right away (and most service style tasks fit into this tier), and one for "flex" tasks that have flexible needs for how quickly they need to be launched. Fenzo will assign resources to tasks in critical tier before considering assignments for tasks in the lower tier, flex. Note that tiers in Fenzo are numbered 0 to N-1 for N tiers. For us, critical is tier 0 and flex is tier 1.
This does not, however, prevent the case of the cluster being saturated with lower tier tasks. In which case, a new task in the higher tier will have to wait until resources are made available due to completion of some tasks (e.g., batch tasks eventually complete).
In the future, we will be introducing preemptions to ensure that the higher tier tasks can get resources immediately by terminating some lower tier tasks.
Currently, we take the approach of guaranteeing resources for each tier using different set of agents. So, say, a set of agents are "earmarked" for tier 0 and a different set of agents are earmarked for tier 1. We do this by setting a Constraint that ensures tasks of a certain tier go to its preferred agents. This also allows us to ensure there is sufficient capacity for each tier, separately.
We create separate sets of agents using the autoscale groups in Fenzo. See, AutoScaleByAttribute settings in the TaksScheduler's Builder class.
I spoke about capacity guarantees recently at QCon San Francisco. The slides are available here. The video should be available later from QCon.
For a framework based on Fenzo, what are the guidelines for scheduling service style tasks? I am looking for a use case to schedule mix of service and batch jobs. The queueable task input to Fenzo has no distinction for service or batch jobs. This implies that framework should restart the service job when the job finishes/fails. One way is to push the failed/finished service job back in the pending queue, and wait for Fenzo to schedule the job. However, this may lead to interruption of service till the time the prior pending jobs in the queue gets scheduled.
Is there any recommendation to handle restart for the service style tasks and also minimize the interruption of the service?