anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
5.98k stars 551 forks source link

Support Bitnami embedded SBOMs #3065

Open willmurphyscode opened 1 month ago

willmurphyscode commented 1 month ago

What would you like to be added:

As part of anchore/grype#1609, Syft should pick up on sboms in containers located at /opt/bitnami because this is how Bitnami records what's in an image.

The SBOM cataloger would probably do this already, but is off by default.

There are a few open questions here:

  1. How should packages discovered by other catalogers interact with these SBOMs? For example, the binary cataloger might find Redis or MariaDB executables.
  2. What if someone is building something FROM a Bitnami image? How do we know we can trust the SBOM?
  3. If we are special-casing Bitnami images, e.g. turning the SBOM cataloger on by default only for certain images or certain paths, how do we detect this situation and what configuration options are available?

Why is this needed:

This is primarily needed so that running grype on a Bitnami image (see anchore/grype#1609) is as accurate as possible.

Additional context:

There are a few open requests for more accurate Bitnami classification. Ideally this work might also fix those.

kzantow commented 1 month ago

Is there another way to scan these artifacts? Are these container images in some differing format from OCI? If the only way to identify what is installed is by scanning an SBOM, there could probably just be a Bitnami cataloger that looks for specific SBOMs in these known bitnami locations, instead of enabling the SBOM cataloger itself. It's pretty easy to just pass a reader to the SBOM decoder. And then we'd probably want to have a way to prevent SBOMs from getting scanned twice if a user does enable the SBOM cataloger.

willmurphyscode commented 1 month ago

Two questions for investigation:

  1. If we add a bitnami cataloger, and turn both it and the SBOM cataloger on, do we get duplicates?
  2. Do we and should we surface all information from the bitnami SPDX in the Syft output SBOM? It might be that the interface for a cataloger is too specific; it only returns packages and relationships. SPDX can express more than this.

The easy path to implement this is essentially a copy of the SBOM cataloger with a much narrower file glob, assuming it doesn't cause duplicates or miss critical information.