apache / aurora

Apache Aurora - A Mesos framework for long-running services, cron jobs, and ad-hoc jobs
https://aurora.apache.org
Apache License 2.0
635 stars 232 forks source link

Flag for enabling SLA Aware killing for non-prod tasks #62

Closed ridv closed 5 years ago

ridv commented 5 years ago

Currently, SLA aware killing is only possible for prod tier tasks. Since the intention of SLA aware killing is for it to be used with only a limited subset of jobs in the cluster, it is understandable that it was approached in this way.

However, for existing clusters that don't use tiering this presents a significant challenge for enabling SLA aware killing. All jobs in the cluster would have to be recreated with a production tier attached to them and a quota would have to be added for every single role within the cluster. Furthermore, any task that would like to use a new role, would require setting a new role quota.

Given the issues outlined, I propose we add a flag that allows operators to enable SLA aware killing for non-production tasks. The flag would be disabled by default.

@shanmugh would be great to get your thoughts on this if you have some time.

I have a POC ready to be reviewed if no one is opposed to this idea: https://github.com/rdelval/aurora/commit/31bc9b4622220f360a812c7b8b66cf5c95578bfd