anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.21k stars 573 forks source link

append additional data - or plugins? #1380

Open deitch opened 1 year ago

deitch commented 1 year ago

What would you like to be added:

How can I tell syft, "add this data to the SBoM you generate"?

Scenario: I have data that I know syft will not find. Perhaps it is obscure, our own inputs, etc. I still want to use syft to scan everything, but I also want the resultant SBoM to include the additional data.

As an example, in one project, we enforce all docker builds with --network=none, so that the only ways you can get things in off the network is via ADD. This, in turn, lets us scan Dockerfiles and capture all ADD commands.

What would be the right way to augment the complete syft output with this data? I could see one of the following (but open to others):

Why is this needed:

syft never will capture 100% of everything; having a mechanism for telling syft, "add this data" would expand its capabilities.

spiffcs commented 1 year ago

@deitch do you have a good sample of the data you're trying to add?

The reason I ask this is syft has a way of adding packages to an SBOM being generated from other SBOM files: https://github.com/anchore/syft/blob/main/syft/pkg/cataloger/sbom/cataloger.go

If you include an SBOM within the directory or image in the format of: https://github.com/anchore/syft/blob/17aa8287e6265e5ea65c8946bd790c2cbf444172/syft/pkg/cataloger/sbom/cataloger.go#L14-L30

Then syft will pick it up and add the packages into the generated SBOM.

However, this doesn't cover things like docker information which is what it seemed your question was about.

Apologies for the indirect answer, but if you want to add data to the SBOM generated then this is the best way for now.

Happy to take other suggestions or hear if this meets your use case!

deitch commented 1 year ago

No apologies, this is pretty good.

So are you saying, if I add an SBoM in one of the above formats - which looks like it covers syft, syft-json, spdx, spdx-json, cyclonedx, cyclonedx-json - it should include it?

Two questions:

  1. Is this in the docs somewhere?
  2. How would I do it without adding it to my source? I don't necessarily include my syft output in my source (likely bundle them all together separately or point them at each other or OCI artifacts etc); I don't want to include the other one either.

do you have a good sample of the data you're trying to add?

Just what I listed above, and it isn't in anything I can show you yet:

we enforce all docker builds with --network=none, so that the only ways you can get things in off the network is via ADD. This, in turn, lets us scan Dockerfiles and capture all ADD commands

I wrote a simple scanner that can be given one or more Dockerfiles, it looks for ADD commands and spits those out as packages. I think I did it to support spdx, spdx-json and maybe one or two others. These Dockerfiles make up lots of OCI images which are then unpacked and bundled together. syft does a pretty good job scanning the final output, but it sometimes misses things that have been ADDed. It is ~20 Dockerfiles, so I want to:

  1. scan the Dockerfiles (with my tool) to generate an SBoM just for those
  2. scan the final layout (tar or dir) via syft to get the real SBoM
  3. add my outputs to syft's to get it all

I would be happy to throw my tool out if syft could replace it. That would mean:

  1. syft would need to be able to scan Dockerfiles and find ADD commands
  2. syft would need to be able to scan multiple things at once - the various Dockerfiles, the final dir/tar, etc. - to generate a single SBoM

But even if it did, these two ideas enhance Syft's usability:

tgerla commented 1 year ago

Hi @deitch, we're looking at some older issues here, and we're wondering what packages Syft are not picking up in the above example? We want to start looking at ways to improve Syft so that users don't have to bring your own data, so that it can discover these packages for you without manual intervention. Sorry for the long delay between replies! If you want to pick this thread up again let us know, and we can discuss. Otherwise let me know if if maybe it's OK to close this issue. Thanks!

deitch commented 1 year ago

Hi @tgerla . Yeah, I remember this one. The solution by @spiffcs worked reasonably well.

The example I gave above pretty much covered it. It is one of two cases:

let me know if if maybe it's OK to close this issue

I have two questions:

  1. Is including an SBoM file inside the image/dir/tar to be scanned documented anywhere? I asked it above, but didn't get an answer. It works, but I had to open an issue to get it.
  2. Is there a CLI or similar option for adding it without including it in the tree?

For the latter, here is an example. I want to create an SBoM for an OCI image by scanning it with syft. I also want to include some additional data via an SBoM I already have. I cannot quite "include" my SBoM in the image, as it is composed of a bunch of layers of tgz. Theoretically, I could expand it all out and then add it and scan the dir, but who wants that awful user experience?

Something like:

syft packages oci:myimage/foo:bar --append path/to/sbom.json --append path/to/other/sbom.json

Or this even might work, if slightly less convenient

syft packages oci:myimage/foo:bar path/to/sbom.json path/to/other/sbom.json

In the first, I am saying, "scan that image, but also append the following sboms". In the second, I am saying, "use your usual scanning goodness, but scan that image, and this directory/file, and that directory/file."

tgerla commented 1 year ago

Hey @deitch, thanks again for the details. I think your "--append" idea is a good one, and we understand how it might be useful rather than having to embed separate SBOMs on your images to be scanned. I'll go ahead and move this into the feature backlog.

wagoodman commented 9 months ago

This is somewhat related to: