eraser-dev / eraser

🧹 Cleaning up images from Kubernetes nodes
https://eraser-dev.github.io/eraser/
Apache License 2.0
493 stars 62 forks source link

[REQ] Record removed images periodically and easily accessible to operators #929

Open ritazh opened 10 months ago

ritazh commented 10 months ago

What kind of request is this?

Improvement of existing experience

What is your request or suggestion?

Currently, for removed images, Eraser record the removed images in logs and records the total count of removed images as a metric. In the event the pod log is gone, this information is gone as well. To make it easier for operators to troubleshoot, would be good to consider some other mechanism to record the removed images somewhere in the cluster periodically.

e.g. Record this as part of the status field of an eraser custom resource along with a timestamp. If etcd object size is a concern, then we can consider a configurable field for the maximum number of recorded removed images.

WDYT?

Are you willing to submit PRs to contribute to this feature request?

sozercan commented 10 months ago

I want to understand the use case for this.

From the past reports, I have seen folks that wanted to see why an image did not get removed since the trivy used in eraser did not match another scanning tool results. A list of removed images will not be able to tell this story, and even more so if it's constrainted to a certain number of results. This information is not informative for non-removed images nor actionable.

Possible investigations:

pmengelbert commented 9 months ago

I think a large part of the problem is that observing Eraser's behavior (or even that it worked) is currently too difficult. IMO the best thing to do is provide a report with all of the relevant information. How we provide that report is unclear, but what I would like to see is:

IMO having this information will not only benefit the end-user but will make eraser more robust as a whole. As a developer it's unwieldy to get this information (currently only available via debug logs, not aggregated in any way, etc). I'm 90% confident that gathering the above information into a report will reveal bugs we haven't noticed before. There are probably still scanning & removal issues because of the ImageID vs Manifest Digest issue that have been overlooked.

Finally, a report will make testing a lot easier. We can set up the initial state and define the exact end-result we require from Eraser.