anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.03k stars 555 forks source link

Discovery of SBOMs on the Rekor transparency log #1159

Closed mdeicas closed 3 months ago

mdeicas commented 2 years ago

Motivation

It is not always possible to look inside executables and report accurate information on their contents and dependencies. This information is accessible at the build time of executables, but there has been no general way to propagate this data to a later stage in the software supply chain.

With the development of the Sigstore supply chain security infrastructure, it is now possible to access information from the build time of artifacts. This issue and related PRs propose a way to incorporate this information into Syft.

This PR is part of the broader picture to allow Syft to handle finding SBOMs (https://github.com/anchore/syft/issues/737) and to enable the use of external sources (https://github.com/anchore/syft/issues/1115).

The rekor-cataloger

https://github.com/anchore/syft/pull/1157 contributes a package which can search the Rekor transparency log for information about SBOMs of executables, and the rekor-cataloger, the integration point between the package and Syft.

Demo

To demo the rekor-cataloger, run Syft on an image containing binaries that have SBOMs on Rekor. One such image is here https://hub.docker.com/r/mdeicas/sample-golang-prov.

syft packages sample-golang-prov.tar -o spdx-json --file spdx.json --catalogers all

This is the diff between an execution of syft with and without this PR:

+[0000] DEBUG cataloging with "rekor-cataloger"
+[0000] DEBUG rekor is being queried for 
+               Location: /sample-golang-prov 
+               SHA256: f2e59e0e82c6a1b2c18ceea1dcb739f680f50ad588759217fc564b6aa5234791
+[0000] DEBUG rekor entry 2790629 was retrieved
+[0000] DEBUG verification of rekor entry 2790629 complete
+[0000] DEBUG SBOM (798688 bytes) retrieved
+[0001] DEBUG rekor entry 2790625 was retrieved
+[0001] DEBUG verification of rekor entry 2790625 complete
+[0001] DEBUG error parsing or validating attestation associated with rekor entry 2790625: 
+               the attestation predicate type (https://slsa.dev/provenance/v0.2) is not the accepted type (google.com/sbom)
+[0001] DEBUG relationship created for SBOM found on rekor
+[0001]  WARN 
+                       [EXPERIMENTAL FEATURE: Rekor-cataloger] 
+                       
+                       This SBOM contains a relationship that references an external document. This 
+                       document is not present in the cataloged image or directory; rather it has 
+                       been found by searching the Rekor transparency log (https://www.sigstore.dev/).  
+                       
+                       Trusting this external document relationship requires trusting several entities: 
+                               - the user or CI/CD action that uploaded an entry to Rekor
+                               - Rekor transparency log
+                               - Fulcio CA
+
+                       The Rekor entry(s) that were used to create the external document relationship(s)
+                       are listed below by UUID. See https://github.com/sigstore/rekor for 
+                       information on how to query Rekor. 
+                               [362f8ecba72f432677c5a08384c08e85445632ae4078b94fff43651770e12eb1d3ca43e45fae3a15]
+
+[0001] DEBUG discovered 0 packages

How the rekor-cataloger works

Upon finding an executable, Rekor is searched by hash. The log entries and associated SBOMs are retrieved and verified, and relationships are created. The SBOM information is obtained from an in-toto attestation (https://github.com/in-toto/attestation) associated with the Rekor entry. Here is an example:

{
  "_type": "https://in-toto.io/Statement/v0.1",
  "predicateType": "google.com/sbom",
  "subject": [
    {
      "name": "binary-linux-amd64",
      "digest": {
        "sha256": "f2e59e0e82c6a1b2c18ceea1dcb739f680f50ad588759217fc564b6aa5234791"
      }
    }
  ],
  "predicate": {
    "sboms": [
      {
        "format": "SPDX",
        "digest": {
          "sha256": "02948ad50464ee57fe237b09054c45b1bff6c7d18729eea1eb740d89d9563209"
        },
        "uri": "https://github.com/user/repo/releases/download/v1.3/binary.spdx"
      }
    ]
  }
}

The SBOM that is output by Syft uses external reference relationships to refer to the SBOMs discoverd by the rekor-cataloger. Merging the SBOMs was considered to be an optional follow-up feature, and is still under investigation (https://github.com/anchore/syft/issues/617).

The rekor package exports an ExternalRef type that represents information about an external sbom. It is an identifiable, and is placed into a Syft relationship to upstream the information. When mapping the Syft SBOM format to other formats, relationships with ExternalRefs are handled in accordance with each format’s specification. In SPDX, they appear in the external reference documents section in addition to being referenced in a relationship. Here is an example (edited):

...
"externalDocumentRefs": [
  {
   "externalDocumentId": "DocumentRef-24a791393ed162b5",
   "checksum": {
    "algorithm": "SHA1",
    "checksumValue": "eb141a8a026322e2ff6a1ec851af5268dfe59b20"
   },
   "spdxDocument": "http://www.example.com/binary.spdx"
  }
 ]
...
 "files": [
  {
   "SPDXID": "SPDXRef-9dc5bd9a21b3b63c",
   "comment": "layerID: sha256:be555362a16f0f6b27f194ed8fc0fd5b640a300f809eafe5799676a53bbcfc7b",
   "licenseConcluded": "NOASSERTION",
   "fileName": "/sample-golang-prov"
  }
 ]
...
 "relationships": [
  {
   "spdxElementId":"SPDXRef-9dc5bd9a21b3b63c",
   "relationshipType": "DESCRIBED_BY",
   "relatedSpdxElement": "DocumentRef-24a791393ed162b5"
  }
]
...

The rekor package can only read log entries that are associated with in-toto attestations. The content of the SBOM that is referenced in the attestation must successfully be retrieved to continue execution, and only SPDX SBOMs can be read.

Managing external sources

The use of external sources is new to Syft, and they should be managed carefully (i.e. configurability, clear to users what has been used and how). Accordingly, https://github.com/anchore/syft/pull/1158 introduces a new external sources configuration, an additional function that catalogers must implement, and a cli flag to shut off the use of external sources. This approach assumes that external sources will only come into Syft through catalogers.

Separate from that PR, rekor-cataloger logs a warning indicating what was used to create the output SBOM (see the log output above).

Verification of data

The use of external sources requires verification of data that is found. Absent inconsistencies that are outlined below, the rekor-cataloger currently accepts all Rekor entries that have certificates issued by Fulcio. In the future, the rekor-cataloger can be extended to limit accepted entries to ones that match specific identities.

To explain the verification actions that are taken, simplified depictions of the Rekor log entry and in-toto attestation data formats are shown here:

Attestation:
    subject:
        hash (this is the hash of the binary)
    predicate:
        sbom-hash  
        sbom-uri

Rekor log entry:
    timestamp 
    attestation-hash
    certificate

The rekor package retrieves the Rekor log entry, the associated in-toto attestation, and the SBOM. It performs verification to ensure that the retrieved data has not been tampered with. It verifies that:

These steps ensure that the retrieved information, and the upstream external document reference that is produced, can be trusted if Rekor, Fulcio, and the certificate subject are trusted.

A current limitation of Rekor entries for in-toto attestations does not allow the verification of the certificate subject’s signature over the attestation (https://github.com/sigstore/rekor/issues/582). Once this is possible, Rekor will not need to be trusted.

When a builder, such as the slsa-github-generator (https://github.com/slsa-framework/slsa-github-generator), generates the SBOM and uploads it to Rekor, a path from source code to SBOM is created. In this case, the only trust predicates are the builder and Fulcio.

Surfacing packages versus surfacing binaries

Edit: I realized that Syft can create files, not just packages. Binaries can be represented using files, and the below doesn't apply anymore :smiley:.
External document references that the rekor-cataloger produces must be related to SBOM entries for executables as opposed to entries for the packages they contain (in-toto attestation subjects are executables, not packages). Currently, Syft only surfaces packages. Binaries that are found, but that cannot be looked inside of, do not appear in the SBOMs output by Syft. This PR includes a temporary solution to allow the use of the rekor-cataloger for golang binaries. It involves a change (see commit titled “surface external relationships”) to the golang-binary-cataloger to create SBOM entries not only for the packages that executables contain, but also for the executables themselves. This allows the rekor-cataloger to create external reference relationships using the entries for golang executables. Since no entries are created for binaries that are not golang-compiled, the results from the rekor-cataloger for them will not appear in output SBOMs. Another implication is that the rekor-cataloger cannot be run without the golang-binary cataloger, as rekor-cataloger does not itself create packages. This also raises the larger question of whether Syft should only surface an executable when it can provide meaningful information for it. The current design prevents the rekor-cataloger’s ability to report information in the output SBOM, but also should raise wider questions about how the completeness of SBOMs output by Syft is perceived. This topic is out of the scope of this issue.

Follow up work

popey commented 3 months ago

Closing as this is superseded by #1291 🙏