Open rchincha opened 2 years ago
Thanks for the issue @rchincha!
Can you walk us through the reasoning for excluding packages? We want syft to be as close to the truth as possible when generating an SBOM.
Allowing users to exclude or omit packages that are present and cataloged seems a little outside of that goal.
Definitely happy to talk through how you would use it!
We've also added this to the agenda for tomorrow's community meeting for syft and grype. Feel free to join there as well and will get other feedback from the community!
https://twitter.com/GrypeProject/status/1574431163799801856?cxt=HHwWgMC8maizwNkrAAAA
Thanks for the issue @rchincha!
Can you walk us through the reasoning for excluding packages? We want syft to be as close to the truth as possible when generating an SBOM.
Allowing users to exclude or omit packages that are present and cataloged seems a little outside of that goal.
Definitely happy to talk through how you would use it!
Thanks @spiffcs.
We have a situation where we build a chroot without the package db of any sort. So in order to get the syft's sbom capability, we setup a separate environment with some base distro install, install required packages on top of it and would now like to exclude the packages in the base install if appropriate. Alternatively, instead of a blacklist (--exclude), perhaps a whitelist (--include) will work better. Hope the problem statement is clear.
Any additional thoughts/updates on this?
Hi @rchincha, we discussed this at the community meeting last week (see the notes here). If I understand the use case you're talking about it's less about excluding packages and more about only including user-defined packages (but excluding the base image packages), is this correct? If so, this is something that has been asked for before and something we'd like to do. We have the concepts of scopes but currently only squashed
and all-layers
. We would add another scope something like user-layers
and I suspect would be easier to for you to use than an explicit exclude list of packages, what do you think?
@kzantow, yes spot-on our requirement.
Your suggestion about user-layers
could work also - I assume you will work out what that would mean in terms of how one would figure out which the user-layers
are. For us, given a base set of layers, we can install all our packages in a new layer, then scanning and reporting from that new layer alone could work.
https://github.com/anchore/syft#sbom
^ also could you expand a bit more about squashed
and all-layers
. What is the difference? Perhaps an example or two, for our understanding.
@rchincha the idea is we would just exclude the layers from the base image, so any layers you add from your own Dockerfile would be included. I'm not sure we've worked out every detail here, but that's the gist.
As for squashed
vs all-layers
:
squashed
: only scans the final layer filesystemall-layers
: scans each layer in the image individuallyThe difference here is all-layers
would find things that were present at one point, but removed before the final filesystem.
@kzantow thanks for the clarification, it is the deletions that make the two options different.
About user-layers
, is there an ETA to expect. We don't mind pitching in if it helps expedite.
@rchincha we do not currently have an ETA for this, but of course PRs are welcome! FYI - I believe this change would probably need to be done predominantly in the stereoscope library, which Syft relies on for processing images.
@kzantow after thinking about this some more, also wondering if an --offline
option is feasible.
Most package managers, given a package name/version, can also list files included in the package and files to be installed.
$ dpkg -l curl
ii curl 7.81.0-1ubuntu1.4 amd64 command line tool for transferring data with URL syntax
$ dpkg-query -L curl
/.
/usr
/usr/bin
/usr/bin/curl
...
etc
So the question is can one simply pass the package name/version and its constituent list of files and generate a SPDX document? This of course will be orthogonal to grokking container images.
@rchincha I don't quite follow --offline
in this context, but if you're thinking about providing Syft with a list of packages and/or files, this might be feasible way to do things. We are working on having a way to catalog SBOMs we find on the file system, and we could potentially add a "simple" SBOM format that's like a CSV or text file.
"but if you're thinking about providing Syft with a list of packages and/or files, this might be feasible way to do things." exactly this ^
What would you like to be added:
Once packages are discovered using the cataloger, can I specify a list of packages to be excluded?
Why is this needed:
Generate a SBOM only for a subset of packages.
Additional context: