Open rmccarth opened 2 years ago
Indeed it does!
Short answer: take a look at the .location
field on each package in the syft-json output.
A little more detail; take an image based on the following dockerfile:
# tag: localhost/example:latest
FROM ubuntu:latest
RUN apt update -y && apt install -y python3-pip
RUN pip install click
And output from:
$ docker sbom localhost/example --format syft-json > sbom.json
The SBOM has some information about the image that was cataloged:
$ cat sbom.json | jq '.source.target.layers'
[
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"digest": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7",
"size": 72756862
},
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"digest": "sha256:c64d7db663506b5aeaa667ef426c5a2416a653796ce3a035c9facc27be445b3c",
"size": 356087239
},
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"digest": "sha256:2ab470a20aeb5ed65801c513633e7df8d88ec5c44a429b70ccf2cf9c117ebf96",
"size": 746744
}
]
These layers are in the same order they appear in the Dockerfile:
from...
: sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7
run apt...
: sha256:c64d7db663506b5aeaa667ef426c5a2416a653796ce3a035c9facc27be445b3c
run pip...
: sha256:2ab470a20aeb5ed65801c513633e7df8d88ec5c44a429b70ccf2cf9c117ebf96
The default analysis is to look at the image from the squashed representation of the image, which is the same if you were to create a container from the image and look at the mounted filesystem. Taking a closer look at the packages found in the image, all of the locations that are evident of the packages existence are indicated on the .locations
field of the package.
So, back to our example... let's find the click
package in the SBOM:
$ cat sbom.json | jq '.artifacts[] | select(.name == "click")'
{
"id": "22eab18eb19f9d64",
"name": "click",
"version": "8.1.2",
"type": "python",
"foundBy": "python-package-cataloger",
"locations": [
{
"path": "/usr/local/lib/python3.8/dist-packages/click-8.1.2.dist-info/METADATA",
"layerID": "sha256:2ab470a20aeb5ed65801c513633e7df8d88ec5c44a429b70ccf2cf9c117ebf96"
},
{
"path": "/usr/local/lib/python3.8/dist-packages/click-8.1.2.dist-info/RECORD",
"layerID": "sha256:2ab470a20aeb5ed65801c513633e7df8d88ec5c44a429b70ccf2cf9c117ebf96"
},
{
"path": "/usr/local/lib/python3.8/dist-packages/click-8.1.2.dist-info/top_level.txt",
"layerID": "sha256:2ab470a20aeb5ed65801c513633e7df8d88ec5c44a429b70ccf2cf9c117ebf96"
}
],
"licenses": [
"BSD-3-Clause"
],
... (truncated)...
Note that the layer information for all location paths are stored, and the earliest layer was in sha256:2ab470a20aeb5ed65801c513633e7df8d88ec5c44a429b70ccf2cf9c117ebf96
(layer 3).
Let's look at another package, bash
, which should be in layer 1 (the base image):
$ cat sbom.json | jq '.artifacts[] | select(.name == "bash") | .locations'
[
{
"path": "/usr/share/doc/bash/copyright",
"layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
},
{
"path": "/var/lib/dpkg/info/bash.conffiles",
"layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
},
{
"path": "/var/lib/dpkg/info/bash.md5sums",
"layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
},
{
"path": "/var/lib/dpkg/status",
"layerID": "sha256:c64d7db663506b5aeaa667ef426c5a2416a653796ce3a035c9facc27be445b3c"
}
],
It looks like most filed indicate layer 1 but /var/lib/dpkg/status
was found in layer 2. This is because we're looking at the squashed representation of the image, and in layer 2 we updated the DPKG database via the apt
command.
What about the first indication for /var/lib/dpkg/status
?
We can tell docker sbom
to consider all layers and not just the squashed representation by generating the SBOM in the same way, but with an additional flag --layers all
:
$ docker sbom --layers all localhost/example --format syft-json | jq '.artifacts[] | select(.name == "bash") | .locations'
Syft v0.43.0
✔ Loaded image
✔ Parsed image
✔ Cataloged packages [301 packages]
[
{
"path": "/usr/share/doc/bash/copyright",
"layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
},
{
"path": "/var/lib/dpkg/info/bash.conffiles",
"layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
},
{
"path": "/var/lib/dpkg/info/bash.md5sums",
"layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
},
{
"path": "/var/lib/dpkg/status",
"layerID": "sha256:867d0767a47c392f80acb51572851923d6d3e55289828b0cd84a96ba342660c7"
},
{
"path": "/var/lib/dpkg/status",
"layerID": "sha256:c64d7db663506b5aeaa667ef426c5a2416a653796ce3a035c9facc27be445b3c"
}
]
Now we get the "full picture" where first evidence of the /var/lib/dpkg/status
file having the bash
package was found in layer 1 conclusively.
Why is it also found in layer 2? Since image builders and containers use copy-on-write semantics, and DPKG puts all package entries into a single file, when python-pip
was installed the full DB file was copied (which contained the bash
entry). Syft, the underlying tool that drives docker sbom
is smart enough to detect this and deduplicate the packages, but still leave the extra location evidence on the de-duplicated package.
This was a little verbose, but @rmccarth let me know if you have anymore questions about this 👍
will the output describe the layer in which the software was first introduced?