anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.18k stars 571 forks source link

Support generating multiple BOM files in different formats within a run #325

Closed ScottChapman closed 2 years ago

ScottChapman commented 3 years ago

We like the idea of having a superset data format like JSON, but we also want to be generating CycloneDX format for compatibility with other tooling we're invested in. I think either you could support multiple -o options to generate the formats all at once, or perhaps support conversion from JSON to the other formats you support

luhring commented 3 years ago

@ScottChapman This is interesting!

I could see a few syntaxes to consider, e.g.:

  1. -o json,cyclonedx
  2. -o json -o cyclonedx

Something to consider: Right now, with just a single output format, it makes the process of saving or redirecting the output very easy and straightforward. For example:

Saving the output:

syft ... -o json > myimage.json

Piping the output to another tool:

syft ... -o json | jq '<some expression>' | ...`

I'm curious how we'd explain the process of integrating syft into shell commands and scripts when using multiple BOM formats...

ScottChapman commented 3 years ago

@ScottChapman This is interesting!

I could see a few syntaxes to consider, e.g.:

  1. -o json,cyclonedx
  2. -o json -o cyclonedx

Something to consider: Right now, with just a single output format, it makes the process of saving or redirecting the output very easy and straightforward. For example:

Saving the output:

syft ... -o json > myimage.json

Piping the output to another tool:

syft ... -o json | jq '<some expression>' | ...`

I'm curious how we'd explain the process of integrating syft into shell commands and scripts when using multiple BOM formats...

That's a good point. Typically the -o flag is used to specify an output file, it would be a bigger change but I would use a different flag for the format like -f. Without the -o output would go to stdout. So then it would be something like: syft ... -f json -o output.json -f cyclondex -o output.xml.

...or you could support a new feature which would be to convert the JSON -> another format, like: syft -c output.json -o cyclonedx

luhring commented 3 years ago

Typically the -o flag is used to specify an output file

That's a good point. I've seen both, but most tools with -o that I can think of are asking you to specify a place to store something (e.g. build tools, Docker, curl, etc.).

Since users are already using -o, especially in scripts, making this change would break a lot of workflows.

or you could support a new feature which would be to convert the JSON -> another format

I like this! We've talked about this internally a little bit. I think there's potential here...

luhring commented 3 years ago

or you could support a new feature which would be to convert the JSON -> another format

I like this! We've talked about this internally a little bit. I think there's potential here...

This idea is now described briefly in https://github.com/anchore/syft/issues/400.


EDIT: There's now an even more explicit issue for supporting SBOM format conversion in Syft: https://github.com/anchore/syft/issues/563

luhring commented 2 years ago

Jotting down an idea for CLI syntax from @wagoodman —

To specify multiple output formats, you'd specify the flag -o/--output multiple times, and you can designate the output file name for the given format using the form -o <format>=<filename>, where <format> is one of Syft's enumerated output formats (as are already used today with the -o flag), and <filename> is an arbitrary file path to which Syft should write the SBOM for that format.

For example:

syft <your-image> -o spdx=my-sbom.spdx.json -o json=my-sbom.syft.json

This could optionally tie into an idea I had recently, too:

As a convenience: if multiple formats are specified (such as via using -o multiple times), the existing --file argument could be interpreted as a filename pattern to which format-specific file extensions would be appended. This lets the user "factor out" a common string for the filename.

For example:

syft <your-image> -o spdx -o json --file ./my-sbom

Would produce the files:

./my-sbom.spdx.json
./my-sbom.syft.json

...given the assumption that we've defined .spdx.json as the default file extension for the spdxjson format, and .syft.json for the json format. These values are certainly up for discussion, what matters is that each format is associated with a single, default file extension, which can be overridden as needed using the above-mentioned <format>=<filename> syntax.

kzantow commented 2 years ago

There are a number of suggestions here. I'd like to propose using the --output/-o flag and staying consistent with the <scheme>:<source>. We would treat it as follows:

wagoodman commented 2 years ago

The input parsing approach attempts to follow the URI spec for resolving a resource (how to get it and interpret it as). I don't think applying the URI spec makes semantic sense from the perspective of specifying the format and location to write the SBOM out to.

I agree that using : instead of = looks consistent with how we specify inputs, but I think that consistency doesn't need to apply in cases where the semantics are different (since URI doesn't/shouldn't convey here).

The format=filename suggestion was pulling from other CLIs that accept name-value pairs as arguments (such as kubectl).

kzantow commented 2 years ago

Oddly, kubectl apparently does both! In one case it uses : to define the column spec. But yeah, most of the examples are definitely key=value, so substituting = for : above, does this make sense?: