cloudfoundry / eirini

Pluggable container orchestration for Cloud Foundry, and a Kubernetes backend
Apache License 2.0
115 stars 30 forks source link

[Request] Delay deletion of k8s jobs for cf tasks until logs can be tailed #115

Closed JSchuenke closed 3 years ago

JSchuenke commented 4 years ago

Description

When a cf task is run in cf-for-k8s, a corresponding k8s jobs is created. To get the logs from this job into the log stream, the fluentd sidecar will pick up the log file of the new container spun up to run it. After the task is completed, Eirini will immediately delete the job and its logs. This is unfortunate as it causes us to lose logs for very short tasks because the container and its logs are deleted before fluentd can tail the log.

Suggested fix

Is there a way we could implement a mandatory time to live for containers we need logs from? Waiting even 30 secs would be a huge help. There is a concept like this in k8s we could lean on, but its still in alpha: https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs

We might have to make this configurable as well, because there might still be cases where there are a ton of logs for a short run task. Allowing an operator to extend this or override it would be useful in such cases.

Steps to reproduce

You can find the reproduction steps and original issue here: https://github.com/cloudfoundry/cf-k8s-logging/issues/35

cf-gitbot commented 4 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/175025404

The labels on this github issue will be updated when the story is started.

jimmykarily commented 4 years ago

Linking to a similar issue in kubecf: https://github.com/cloudfoundry-incubator/kubecf/issues/1323 Some of the suggestions may solve both issues.

heycait commented 3 years ago

Hi, was this issue resolved somewhere? I see it was closed but the merged commits mentioning this issue don't seem related.