anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.21k stars 572 forks source link

--exclude-pkgs option #1229

Open rchincha opened 2 years ago

rchincha commented 2 years ago

What would you like to be added:

Once packages are discovered using the cataloger, can I specify a list of packages to be excluded?

Why is this needed:

Generate a SBOM only for a subset of packages.

Additional context:

spiffcs commented 2 years ago

Thanks for the issue @rchincha!

Can you walk us through the reasoning for excluding packages? We want syft to be as close to the truth as possible when generating an SBOM.

Allowing users to exclude or omit packages that are present and cataloged seems a little outside of that goal.

Definitely happy to talk through how you would use it!

spiffcs commented 2 years ago

We've also added this to the agenda for tomorrow's community meeting for syft and grype. Feel free to join there as well and will get other feedback from the community!

https://twitter.com/GrypeProject/status/1574431163799801856?cxt=HHwWgMC8maizwNkrAAAA

rchincha commented 2 years ago

Thanks for the issue @rchincha!

Can you walk us through the reasoning for excluding packages? We want syft to be as close to the truth as possible when generating an SBOM.

Allowing users to exclude or omit packages that are present and cataloged seems a little outside of that goal.

Definitely happy to talk through how you would use it!

Thanks @spiffcs.

We have a situation where we build a chroot without the package db of any sort. So in order to get the syft's sbom capability, we setup a separate environment with some base distro install, install required packages on top of it and would now like to exclude the packages in the base install if appropriate. Alternatively, instead of a blacklist (--exclude), perhaps a whitelist (--include) will work better. Hope the problem statement is clear.

rchincha commented 2 years ago

Any additional thoughts/updates on this?

kzantow commented 2 years ago

Hi @rchincha, we discussed this at the community meeting last week (see the notes here). If I understand the use case you're talking about it's less about excluding packages and more about only including user-defined packages (but excluding the base image packages), is this correct? If so, this is something that has been asked for before and something we'd like to do. We have the concepts of scopes but currently only squashed and all-layers. We would add another scope something like user-layers and I suspect would be easier to for you to use than an explicit exclude list of packages, what do you think?

rchincha commented 2 years ago

@kzantow, yes spot-on our requirement.

Your suggestion about user-layers could work also - I assume you will work out what that would mean in terms of how one would figure out which the user-layers are. For us, given a base set of layers, we can install all our packages in a new layer, then scanning and reporting from that new layer alone could work.

https://github.com/anchore/syft#sbom ^ also could you expand a bit more about squashed and all-layers. What is the difference? Perhaps an example or two, for our understanding.

kzantow commented 2 years ago

@rchincha the idea is we would just exclude the layers from the base image, so any layers you add from your own Dockerfile would be included. I'm not sure we've worked out every detail here, but that's the gist.

As for squashed vs all-layers:

The difference here is all-layers would find things that were present at one point, but removed before the final filesystem.

rchincha commented 2 years ago

@kzantow thanks for the clarification, it is the deletions that make the two options different.

About user-layers, is there an ETA to expect. We don't mind pitching in if it helps expedite.

kzantow commented 2 years ago

@rchincha we do not currently have an ETA for this, but of course PRs are welcome! FYI - I believe this change would probably need to be done predominantly in the stereoscope library, which Syft relies on for processing images.

rchincha commented 2 years ago

@kzantow after thinking about this some more, also wondering if an --offline option is feasible.

Most package managers, given a package name/version, can also list files included in the package and files to be installed.

$ dpkg -l curl
ii  curl           7.81.0-1ubuntu1.4 amd64        command line tool for transferring data with URL syntax

$ dpkg-query -L curl
/.
/usr
/usr/bin
/usr/bin/curl
...
etc

So the question is can one simply pass the package name/version and its constituent list of files and generate a SPDX document? This of course will be orthogonal to grokking container images.

kzantow commented 2 years ago

@rchincha I don't quite follow --offline in this context, but if you're thinking about providing Syft with a list of packages and/or files, this might be feasible way to do things. We are working on having a way to catalog SBOMs we find on the file system, and we could potentially add a "simple" SBOM format that's like a CSV or text file.

rchincha commented 2 years ago

"but if you're thinking about providing Syft with a list of packages and/or files, this might be feasible way to do things." exactly this ^