Open Dentrax opened 2 years ago
Hi @Dentrax, thanks for the issue!
I saw you ran this command:
grype golang:1.17 --output cyclonedx --file result1
The CycloneDX output contains data that's known to be nondeterministic, like a timestamp. Because of this, there's no way to expect the digests of two scans to be identical.
I see you ran Trivy with a template specified. You can do the same thing with Grype, and this gives you enough control of Grype's output to ensure that results are reproducible (and that you'd get the same digest between multiple scans).
Does that make sense?
I tired to pass --output json
flag as you can see in the issue, but it produces non-deterministic digests too. I think it's related to what you said for cyclonedx. (timestamp etc.)
@luhring Ability to pass custom templates would make sense!
Cool!
For how to use templates with Grype, see: https://github.com/anchore/grype#using-templates
For the JSON output format (and possibly others), I think it's worth a discussion on if we want to modify the format to become deterministic. This would mean that we lose metadata like timestamps, but maybe that's okay. 🤔
Another thought... in the name of reproducible results, even with code changes to Grype's output formats, I think we should document the additional steps needed to be performed by the user in order to guarantee a reproducible result, such as:
* obtaining the vulnerability database ahead of time, and telling Grype not to update the database at execution time * ensuring that the scan target itself is referenced in a deterministic way (e.g. an image **digest**)
Sounds so cool! Moreover, by performing this actions, maybe we can upload the deterministic scan result digest to ~fulcio~ Rekor. 🤔
So we can ensure any image foo@sha256:bar
in this case, produces exactly baz
scan result digest. Not so sure what we can do it later, but it would be a cool idea. cc: @dlorenc
That's interesting. Would we want to upload the scan signature+digest to Rekor? I'm not familiar with how this would fit into Fulcio yet.
we can ensure any image
foo@sha256:bar
in this case, produces exactlybaz
scan result digest
There's another important point about reproducibility here: A given fixed image digest should be scanned frequently, and with the latest vulnerability data available at the time, because new vulnerabilities are discovered every day (and, even previously discovered vulnerabilities have their data in upstream data sources updated from time to time).
With this recommended approach of scanning repeatedly, with new vulnerability data, we wouldn't want to assert that all scan results have the same digest. We'd want to allow for new vulnerability matches to be discovered, reported, and used as input to policies wherever appropriate.
^ This point might be obvious, but I wanted to make it explicit just in case, since we're talking about having an image scan produce consistent results. 😃
I'm not familiar with how this would fit into Fulcio yet.
My bad, I meant Rekor. 🙈
we wouldn't want to assert that all scan results have the same digest.
Oh, now I clearly see the concern and why we should not assert the digests. But what if we are using the same vuln-db version? Let's assume we have the vuln-db versioned v1
. And 2 same images with the same digests. In this case, would it make sense to assert that all scan results have the same digest?
So we can push a tlog to Rekor such as: _I scanned the image foo@sha256:bar
against vuln-db v1
using grype v0.26.1
and I expect a JSON
output that has digest qux
._
But still not so sure whether it makes sense since we update the vuln-db every X hour. 🤷
But what if we are using the same vuln-db version? Let's assume we have the vuln-db versioned v1. And 2 same images with the same digests. In this case, would it make sense to assert that all scan results have the same digest?
Yup, exactly! We would be able to expect reproducible scan results in this particular scenario.
So we can push a tlog to Rekor such as: I scanned the image foo@sha256:bar against vuln-db v1 using grype v0.26.1 and I expect a JSON output that has digest qux.
Yeah, I like this. And IMHO we should also provide more information about the vulnerability database, including its digest.
But still not so sure whether it makes sense since we update the vuln-db every X hour.
I think we should strive for reproducibility 💯 under the right circumstances. And we should think about how people will consume these kinds of vulnerability scan attestations and Rekor entries to make informed decisions about the security of their artifacts.
How should we proceed here? :)
Not all output formats are guarenteed to be reproducible. For instance, CycloneDX can never be reproducible given that IDs are recommended to be random.
That being said, there is a chance to make grype JSON documents reproducible:
❯ grype golang:1.17 --output json --file result1.json
✔ Vulnerability DB [no update available]
✔ Loaded image golang:1.17
✔ Parsed image sha256:8685b3216ef4a80742c4d5f29f547838997cc0c7cca68222cfdab7c6821ccf5b
✔ Scanned for vulnerabilities [1130 vulnerability matches]
├── by severity: 36 critical, 288 high, 308 medium, 32 low, 448 negligible (18 unknown)
└── by status: 443 fixed, 687 not-fixed, 0 ignored
A newer version of grype is available for download: 0.74.2 (installed version is 0.74.0)
❯ grype golang:1.17 --output json --file result2.json
✔ Vulnerability DB [no update available]
✔ Loaded image golang:1.17
✔ Parsed image sha256:8685b3216ef4a80742c4d5f29f547838997cc0c7cca68222cfdab7c6821ccf5b
✔ Scanned for vulnerabilities [1130 vulnerability matches]
├── by severity: 36 critical, 288 high, 308 medium, 32 low, 448 negligible (18 unknown)
└── by status: 443 fixed, 687 not-fixed, 0 ignored
A newer version of grype is available for download: 0.74.2 (installed version is 0.74.0)
# $ diff result1.json result2.json
134982c134982
< "file": "result1.json",
---
> "file": "result2.json",
135062c135062
< "timestamp": "2024-01-25T16:31:22.174899-05:00"
---
> "timestamp": "2024-01-25T16:31:36.511252-05:00"
Keeping a time element is critical to vulnerability scans, but there are two time elements in the json output:
cat result2.json | jq '.descriptor'
{
"name": "grype",
"version": "0.74.0",
"configuration": {
...
},
"db": {
"built": "2024-01-25T01:27:56Z",
"schemaVersion": 5,
"location": ".../Library/Caches/grype/db/5",
"checksum": "sha256:0e70dc967985e5a56678500b60aefb9442183c03301261252c7abd7dfae92784",
"error": null
},
"timestamp": "2024-01-25T16:31:36.511252-05:00"
}
Note:
.descriptor.timestamp
: when grype was invoked.descriptor.db.built
: the time the data was sourced and built into the DBWe could add an option that would remove the .descriptor.timestamp
from the grype output, which would make results reproducible when the same configuration/DB is being used. For use cases when you are using different DBs or configuration it is necessary to get the subselection of the grype document you need to do that:
❯ cat result1.json | jq '.matches' | sha256sum
d149e542ee35687266abd6cef70b0038131ee854eb0750d98244acf2c3d760b6 -
❯ cat result2.json | jq '.matches' | sha256sum
d149e542ee35687266abd6cef70b0038131ee854eb0750d98244acf2c3d760b6 -
This could be something like GRYPE_TIMESTAMP=false
(env), but probably not a CLI flag.
What happened:
grype generates different output content for the same image, which breaks the reproducibility.
Motivation comes from the https://github.com/in-toto/attestation/issues/58 to put output result digest in the vuln spec. cc: @developer-guy
Not sure whether this is intentional or time/map object related.
What you expected to happen:
All the output results for the exactly same
IMAGE@sha256:digest
should generate the same digest.How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
I tried the same commands with the trivy. And both SARIF & JSON output formats produced same digest:
Maybe we can get help from the trivy team so cc'ing @knqyf263.
trivy
:0.21.2
Environment:
grype version
:0.26.1
cat /etc/os-release
or similar):macOS 11