anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
5.98k stars 551 forks source link

dpkg packages that are in `deinstalled` state should not be in SBOM #3063

Open mephinet opened 1 month ago

mephinet commented 1 month ago

What happened: Many official docker images (provided by https://github.com/docker-library) are derived from buildpack-deps, which includes a large number of build-related packages, like gcc, git, subversion, perl, imagemagick, ... so programs that are useful for building the program (and maybe further dependencies), but won't be used at runtime.

In order to meet security regulations, this poses an issue, as many of these dependencies have vulnerabilities, e.g. at the time of writing CVE-2024-32002, CVE-2016-10144, CVE-2016-10145, CVE-2023-47100, CVE-2023-31486.

In order to get these vulnerabilities off the audit list, I'm apt-get removeing these packages once all RUN steps depending on any of those programs/libraries are done - so perl, git, ... is no longer part of the final container.

However, when scanning such an issue with syft, these packages are still reported as found.

What you expected to happen: If a package is installed in an earlier layer of the image, and removed in a later layer, I expect the package to no longer be part of the BOM

Steps to reproduce the issue:

Anything else we need to know?:

Environment:

westonsteimel commented 1 month ago

So the debian packaging cataloger works by looking at the /var/lib/dpkg/status file (among other location variations) to understand which dpkg files are installed. In this case, because --purge wasn't used in the remove command, config files were kept so the entirety of the dpkg contents are not removed and stay in that file; however, the status changes from install ok installed to deinstall ok config-files

So changing the command to apt-get remove --purge -y perl git imagemagick gnupg && apt-get -y autoremove should get the behaviour you want.

Whether or not partially installed dpkg packages should be included in the sbom, particularly config-file only installs probably needs a larger discussion

wagoodman commented 1 month ago

I think the short-term answer is to remove these items from the SBOM -- they should not be represented as installed packages.

However, I agree that this opens up a longer-term question on how to deal with "deinstalled" or "config only" cases... I do agree that there should be some representation in the SBOM about this, but we'd need to be careful to not imply they are installed.

One way is to not create a relationship from the source to the package that was deinstalled; from an SPDX perspective there is convention here that this would imply that it is not installed. I'm not a fan of this approach though, since I feel that the common SBOM consumer is probably unaware of this convention and it would appear to be installed. We could do a syft-json only solution for this, but that would leave out supporting SPDX and CyloneDX, which also isn't great.

I'm open to more suggestions here!

mephinet commented 1 month ago

Thanks for shining a light at this issue! I wasn't aware that without adding --purge, the package was still part of debian's status files. Now that I've added --purge to the apt-get remove and apt-get autoremove commands, they are no longer part of the BOM - so my issue is fixed. Thanks a lot for your quick and helpful replies! :heart_eyes:

willmurphyscode commented 1 month ago

Hi @mephinet thanks for the issue. We will keep this open to track the request that @wagoodman made above