eraser-dev / eraser

🧹 Cleaning up images from Kubernetes nodes
https://eraser-dev.github.io/eraser/
Apache License 2.0
480 stars 60 forks source link

[REQ] CronJob/Crontab based image cleanup #1055

Open justusbunsi opened 1 month ago

justusbunsi commented 1 month ago

What kind of request is this?

New feature

What is your request or suggestion?

Hi. First things first: This project saves me a ton of time and headache. Thank you for maintaining it. 👍

Would you be open to supporting cronjob/crontab based image cleanup? This would decouple the controller behavior from its maintenance.

To give some context to my request:

Right now, the image cleanup execution relies on two things:

Configured correctly and started outside of main working hours, this concept works quite well to remove unused/outdated CI images from worker nodes. When updating or restarting the controller-manager (preferably during work hours), the interval and delay gets out of sync with desired cleanup time. Based on the defined interval, this can lead to unnecessary additional image pulls during working hours and increases build times.

Are you willing to submit PRs to contribute to this feature request?

ashnamehrotra commented 1 month ago

@justusbunsi Do you mean using a CronJob to schedule Eraser's ImageJob or have an option to eliminate ImageJobs? Without ImageJobs we will not be able to schedule an eraser pod on each node and watch eraser resources. We treat ImageJobs as a combination of daemonset and cronjob - allowing us to schedule a pod on each node while ensuring we can repeat and run jobs to completion.

justusbunsi commented 1 month ago

Hi @ashnamehrotra. I mean having something like manager.scheduling.cron: "0 2 * * *" as an alternative to manager.scheduling.repeatInterval. The current repeat interval is not predictable as each controller will reset that interval.

ashnamehrotra commented 1 month ago

@justusbunsi got it, yes I think would be very useful!

justusbunsi commented 1 month ago

I'll give it a try. 🙂

stevekuznetsov commented 3 weeks ago

For our use-case, it would be incredibly useful to be able to use PodFailurePolicy to allow image jobs to be resilient to pod phases.