[Request] Delay deletion of k8s jobs for cf tasks until logs can be tailed

JSchuenke commented 4 years ago

Description

When a cf task is run in cf-for-k8s, a corresponding k8s jobs is created. To get the logs from this job into the log stream, the fluentd sidecar will pick up the log file of the new container spun up to run it. After the task is completed, Eirini will immediately delete the job and its logs. This is unfortunate as it causes us to lose logs for very short tasks because the container and its logs are deleted before fluentd can tail the log.

Suggested fix

Is there a way we could implement a mandatory time to live for containers we need logs from? Waiting even 30 secs would be a huge help. There is a concept like this in k8s we could lean on, but its still in alpha: https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs

We might have to make this configurable as well, because there might still be cases where there are a ton of logs for a short run task. Allowing an operator to extend this or override it would be useful in such cases.