chainguard-dev / rumble

Data collection for base image CVEs etc.
Apache License 2.0
4 stars 8 forks source link

Blog Post: Container Image CVE accumulation rates, Stale Image policy, and the relative benefits of minimal images #26

Closed amdawson closed 1 year ago

amdawson commented 1 year ago

I'm envisioning image age on the X-axis and # of vulns on the Y-axis, to see if there is a correlation that can suggest an optimal build horizon policy

In other words, "The data shows that when your image build is N days old, it is very likely to have a critical/high/whatever vulnerability, therefore you should be rebuilding your containers before N days" Signed, John Speed-Meyers Ph.D.

Would also like to make a blog about this when we get some data ready

cc: @mattmoor I had this idea that might inform our build horizon policies, rather than an arbitrary time period like 14 or 30 days

amdawson commented 1 year ago

related https://github.com/chainguard-dev/rumble/issues/1

jspeed-meyers commented 1 year ago

I think data and analysis on the relationship between image build age and number of CVEs (especially high and critical CVEs) could be useful for both Chainguard Images and Enforce. For Enforce, as Adam suggests, it helps determine the relative merits of different build horizons for CVE management. For Images, it helps to highlight, in a quantitative fashion, the benefits of frequently rebuilding Chainguard Images and the reduction in CVEs.

There are at least two approaches to answering this question.

No. 1: Measure the CVE accumulation rate for a set (or sets) of pinned images. At least some data is already available for this analysis today. @jdolitsky and I have been collecting daily data on a number of chainguard images and comparable non-chainguard images for around six months. I could whip up a graph and a few stats in four hours. This analysis is admittedly somewhat different from the build horizon question. This analysis asks: "What is the effect of time on CVE growth for a pinned image?" Not "What is the effect of image build age on number of CVEs?" So consider this the quick and dirty approach.

No. 2: Actually measure image build age and track CVEs for a set of images. If implemented, #1 would enable this analysis using daily logged data. I would need @jdolitsky help to create that feature. Alternatively, another approach (though not mutually exclusive) is to collect image build date and CVE counts for a large set of images (just at one point in time) and do this analysis.

And I agree a blog post on this topic would be worthwhile. It could be a nice opportunity for tie Enforce and Images together in addition to highlighting vulnerability management. I'm glad to assist, but I do want to be careful on overloading myself with tasks. I'm going to unassign myself for now since I'm not sure when exactly I can begin this task. When I actually begin work, I will re-assign myself.

Thanks, @amdawson, for conceiving and explaining this idea.

cc @tracymiranda, @luhring, @amouat, @patflynn, @kaniini, @ktrychon

amdawson commented 1 year ago

I will leave it to you all to determine priority and timeline here and how much effort you want to put in vs. your other priorities. Thanks for thinking through it.

The pinned image idea is probably the easiest. We could also consider using the dataset of images in the enforce evidence lake.

One thought we discussed that I forgot to include at first, is that it might also be useful to inspect one or two levels deeper, perhaps CVE accumulation over time would have different correlation depending on the language ecosystem or type of application, ie java, go, rust, etc, or databases, or app servers or linux or windows. The level 2 analysis could inform different build horizon policies for different technologies, and there could even be a level 3 analysis that goes to a specific image...like maybe mysql or nginx or rabbitmq is uniquely terrible or uniquely excellent.

mattmoor commented 1 year ago

In other words, "The data shows that when your image build is N days old, it is very likely to have a critical/high/whatever vulnerability, therefore you should be rebuilding your containers before N days" Signed, John Speed-Meyers Ph.D.

Generally, I like it! If we have a blog post with the analysis, then we can link to that from "learn more" as well to justify not just WHY, but the actual number 👍

amdawson commented 1 year ago

Update: something like this could also be extended to consider the "scan horizon" as well

https://github.com/chainguard-dev/policy-catalog/pull/120/files

jspeed-meyers commented 1 year ago

Making progress. Just to whet the appetite of anybody reading this. Here's a simple table and graphic of the average daily growth in the overall CVE count for a number of public Docker Hub images and the corresponding chainguard image.

Screenshot 2023-03-27 at 4 28 22 PM

TL;DR - Some images have 1 new reported CVE per day. You can see why build horizon and image staleness is important :) Also, Chainguard Images, on average, have slower CVE growth rates.

Note: All from the rumble v1 database :)

jspeed-meyers commented 1 year ago

Here's a first draft of a blog post.

While doing the research, I realized that determining an optimal build horizon isn't possible simply via this analysis. What this analysis can do is illuminate CVE accumulation rates for a range of images. Figuring out what rate is acceptable for what amount of time requires judgement calls beyond the scope of this data.

I still think this analysis points with data and argumentation to the importance of build horizon, but alas I think the analysis only helps inform a decision on the optimal build horizon policy, but it does not determine an optimal build policy.

ktrychon commented 1 year ago

This is freakin incredible! I just took a spin through and have some questions/comments. @jspeed-meyers can we get access to the data charts so we can get them designed up!

jspeed-meyers commented 1 year ago

Great! I'll quickly respond to the best of my ability to all comments and questions.

@ktrychon, I can certainly get you access to the data charts. One dweeby issue though: I did these charts in a Jupyter notebook on my local machine. Would you like to me to put the data (a CSV) and notebook in a zip file and email that to you? Something else? I can also discuss directly with the design team if you think they might have a preference.

jspeed-meyers commented 1 year ago

@ktrychon and @tracymiranda: here is a zipped file of the Jupyter notebook from which the figures come.

stale-image-analysis.zip

I also put the table data in this Google spreadsheet.

Getting the line graph data out of the notebook is doable but would probably take me an hour. If you need that done, just let me know and I can prioritize it.

@amdawson and @ktrychon, LMK if and when you want further actions from me related to this blog post, analysis, and data.

amdawson commented 1 year ago

looks good to me, i made some suggestions in the blog doc. ready to go when Kaylin is done with media embeds

On Tue, Apr 4, 2023 at 10:35 AM John Speed Meyers @.***> wrote:

@ktrychon https://github.com/ktrychon and @tracymiranda https://github.com/tracymiranda: here is a zipped file of the Jupyter notebook from which the figures come.

stale-image-analysis.zip https://github.com/chainguard-dev/rumble/files/11149703/stale-image-analysis.zip

I also put the table data in this Google spreadsheet https://docs.google.com/spreadsheets/d/1cTfNb5LkdFhLsJ5VlgqqhPQjI2b134PrBKSl9IE1Tvs/edit?usp=sharing .

Getting the line graph data out of the notebook is doable but would probably take me an hour. If you need that done, just let me know and I can prioritize it.

@amdawson https://github.com/amdawson and @ktrychon https://github.com/ktrychon, LMK if and when you want further actions from me related to this blog post, analysis, and data.

— Reply to this email directly, view it on GitHub https://github.com/chainguard-dev/rumble/issues/26#issuecomment-1496085699, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACANUWJCT2HDVZGEE5K4DATW7QWURANCNFSM6AAAAAAVRFQOYQ . You are receiving this because you were mentioned.Message ID: @.***>

jspeed-meyers commented 1 year ago

I submitted this to the MARCOMM queue via the slack blog form.

ktrychon commented 1 year ago

This blog was published on 5/3: https://www.chainguard.dev/unchained/enforce-against-vulnerability-sprawl-with-up-to-date-images