Open KrisWilliamson opened 10 months ago
Proposed implementation
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: openmpp-uat
name: mpi-cleanup
rules:
- apiGroups:
- extensions
- apps
resources:
- deployments
- replicasets
verbs:
- 'patch'
- 'get'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: mpi-cleanup
namespace: openmpp-uat
subjects:
- kind: ServiceAccount
name: sa-mpi-cleanup
namespace: openmpp-uat
roleRef:
kind: Role
name: mpi-cleanup
apiGroup: ""
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sa-mpi-cleanup
namespace: openmpp-uat
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: mpiCleanup
namespace: openmpp-uat
spec:
schedule: "* 6 * * 0"
jobTemplate:
spec:
template:
spec:
serviceAccountName: sa-mpi-cleanup
containers:
- name: hello
image: busybox:1.28
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- kubectl get mpijobs -o go-template --template '{{range .items}}{{.metadata.name}} {{.metadata.creationTimestamp}}{{"\n"}}{{end}}' | awk '$2 <= "'$(date -d'now-24 hours' -Ins --utc | sed 's/+0000/Z/')'" { print $1 }' | xargs --no-run-if-empty kubectl delete mpijob
restartPolicy: OnFailure
There are placeholders in the above example, such as Service account sa-mpi-cleanup
and the roles, etc.
Also a decision will need to be made on when the cron job is to run and how old the jobs have to be to be cleaned up (currently once a week and 24 hours old)
This is good, but we don't want this solution to be limited to a namespace, rather it should be cluster-wide.
Will do.
Also, can I get feedback on how often this should be run (daily, weekly?) and how old the jobs should be before they are deleted (1 day, 1 week, something else?)
We can run the job at midnight and delete any MPI jobs older than 7 days. We can start with that and if needed modify later.
Continuation of https://github.com/StatCan/openmpp/issues/37