docker / sbom-cli-plugin

Plugin for Docker CLI to support SBOM creation using Syft
Apache License 2.0
150 stars 15 forks source link

layer results? #15

Open rmccarth opened 2 years ago

rmccarth commented 2 years ago

will the output describe the layer in which the software was first introduced?

wagoodman commented 2 years ago

Indeed it does!

Short answer: take a look at the .location field on each package in the syft-json output.

A little more detail; take an image based on the following dockerfile:

# tag: localhost/example:latest
FROM ubuntu:latest
RUN apt update -y && apt install -y python3-pip
RUN pip install click

And output from:

$ docker sbom localhost/example --format syft-json > sbom.json

The SBOM has some information about the image that was cataloged:

$ cat sbom.json | jq '.source.target.layers'
[
  {
    "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
    "digest": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7",
    "size": 72756862
  },
  {
    "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
    "digest": "sha256:c64d7db663506b5aeaa667ef426c5a2416a653796ce3a035c9facc27be445b3c",
    "size": 356087239
  },
  {
    "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
    "digest": "sha256:2ab470a20aeb5ed65801c513633e7df8d88ec5c44a429b70ccf2cf9c117ebf96",
    "size": 746744
  }
]

These layers are in the same order they appear in the Dockerfile:

The default analysis is to look at the image from the squashed representation of the image, which is the same if you were to create a container from the image and look at the mounted filesystem. Taking a closer look at the packages found in the image, all of the locations that are evident of the packages existence are indicated on the .locations field of the package.

So, back to our example... let's find the click package in the SBOM:

$ cat sbom.json | jq '.artifacts[] | select(.name == "click")'
{
  "id": "22eab18eb19f9d64",
  "name": "click",
  "version": "8.1.2",
  "type": "python",
  "foundBy": "python-package-cataloger",
  "locations": [
    {
      "path": "/usr/local/lib/python3.8/dist-packages/click-8.1.2.dist-info/METADATA",
      "layerID": "sha256:2ab470a20aeb5ed65801c513633e7df8d88ec5c44a429b70ccf2cf9c117ebf96"
    },
    {
      "path": "/usr/local/lib/python3.8/dist-packages/click-8.1.2.dist-info/RECORD",
      "layerID": "sha256:2ab470a20aeb5ed65801c513633e7df8d88ec5c44a429b70ccf2cf9c117ebf96"
    },
    {
      "path": "/usr/local/lib/python3.8/dist-packages/click-8.1.2.dist-info/top_level.txt",
      "layerID": "sha256:2ab470a20aeb5ed65801c513633e7df8d88ec5c44a429b70ccf2cf9c117ebf96"
    }
  ],
  "licenses": [
    "BSD-3-Clause"
  ],
... (truncated)...

Note that the layer information for all location paths are stored, and the earliest layer was in sha256:2ab470a20aeb5ed65801c513633e7df8d88ec5c44a429b70ccf2cf9c117ebf96 (layer 3).

Let's look at another package, bash, which should be in layer 1 (the base image):

$ cat sbom.json | jq '.artifacts[] | select(.name == "bash") | .locations'
 [
    {
      "path": "/usr/share/doc/bash/copyright",
      "layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
    },
    {
      "path": "/var/lib/dpkg/info/bash.conffiles",
      "layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
    },
    {
      "path": "/var/lib/dpkg/info/bash.md5sums",
      "layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
    },
    {
      "path": "/var/lib/dpkg/status",
      "layerID": "sha256:c64d7db663506b5aeaa667ef426c5a2416a653796ce3a035c9facc27be445b3c"
    }
],

It looks like most filed indicate layer 1 but /var/lib/dpkg/status was found in layer 2. This is because we're looking at the squashed representation of the image, and in layer 2 we updated the DPKG database via the apt command.

What about the first indication for /var/lib/dpkg/status?

We can tell docker sbom to consider all layers and not just the squashed representation by generating the SBOM in the same way, but with an additional flag --layers all:

$ docker sbom --layers all localhost/example --format syft-json | jq '.artifacts[] | select(.name == "bash") | .locations'
Syft v0.43.0
 ✔ Loaded image
 ✔ Parsed image
 ✔ Cataloged packages      [301 packages]

[
  {
    "path": "/usr/share/doc/bash/copyright",
    "layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
  },
  {
    "path": "/var/lib/dpkg/info/bash.conffiles",
    "layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
  },
  {
    "path": "/var/lib/dpkg/info/bash.md5sums",
    "layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
  },
  {
    "path": "/var/lib/dpkg/status",
    "layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
  },
  {
    "path": "/var/lib/dpkg/status",
    "layerID": "sha256:c64d7db663506b5aeaa667ef426c5a2416a653796ce3a035c9facc27be445b3c"
  }
]

Now we get the "full picture" where first evidence of the /var/lib/dpkg/status file having the bash package was found in layer 1 conclusively.

Why is it also found in layer 2? Since image builders and containers use copy-on-write semantics, and DPKG puts all package entries into a single file, when python-pip was installed the full DB file was copied (which contained the bash entry). Syft, the underlying tool that drives docker sbom is smart enough to detect this and deduplicate the packages, but still leave the extra location evidence on the de-duplicated package.

This was a little verbose, but @rmccarth let me know if you have anymore questions about this 👍