anchore / syft

CLI tool and library for generating a Software Bill of Materials from container images and filesystems
Apache License 2.0
6.09k stars 562 forks source link

Allow output of Syft JSON format in multiple schema version #846

Open dmikusa opened 2 years ago

dmikusa commented 2 years ago

What would you like to be added:

Right now the Syft JSON format schema version is hard-coded (seems to be to the latest version). When you bump to a newer version of syft, it will start outputting the new format. It would be helpful if you could control the Syft JSON schema version used for output, like syft package --output json-v2. I think it would be sufficient to control it at the major version level.

Why is this needed:

Well, it's nice to keep on the latest version of the syft tool for bug fixes and scanner improvements but when the output format changes it can take time to adjust the tools that are consuming the output to read the new format.

I can understand if it's not possible to support all major versions for all of time, but supporting the most recent two or three (depending on how quickly the increment) would help provide time to plan and update tools consuming the output.

Additional context:

I'm not sure this would be something necessary when syft hits 1.0, as I'd assume that means the schema changes will be non-breaking, but in the meantime, it would help to have a feature like this so that it can ease the migrations between schemas.

I wouldn't be opposed to alternative solutions either, so this could perhaps be a question and not an enhancement. Like if for example, I could somehow recompile and have it use a different JSON schema version but still get other updates/fixes.

Thanks

ryanmoran commented 2 years ago

We'd like to hear the project's stance on this idea. We've experienced some pretty painful ramifications of our own usage of Syft due to the fact that the output schema formats for any of the SBOM types (Syft, CycloneDX, SPDX, etc.) will just change, sometimes in patch releases. This has left us in a state where we either stop keeping up-to-date with the latest versions of Syft, or we do some pretty crazy gymnastics to make the format output stable.

Since the addition of https://github.com/anchore/syft/pull/864, there is an interface that would allow external formats to be defined. It now appears that it would be possible for third-parties to define their own formats, including legacy formats for SBOM types already defined in Syft.

It would be great to understand a couple of things before anyone embarked on this kind of endeavor:

  1. Is the project opposed to the idea of supporting anything more than the latest schema version of each SBOM format?
  2. If we want this, should we be building our own library of formats to support it?
luhring commented 2 years ago

Hi @dmikusa-pivotal and @ryanmoran — this request makes sense. We've talked about supporting multiple major versions (latest of each) for several of the built-in formats, including Syft JSON. You've probably noticed that format versions have started appearing in format package names in the Syft library code.

Answering the two questions directly:

  1. Is the project opposed to the idea of supporting anything more than the latest schema version of each SBOM format?

In general, no, not opposed. But we'd want to really think about the right constraints to apply to the solution.

  1. If we want this, should we be building our own library of formats to support it?

You could. Part of the intent with exposing the format interface is to allow formats to be supplied by library consumers. Doing it yourself would obviously let you control the timeline of when the implementation was ready to use. And at the same time, I think it makes sense for the Syft library itself to consider implementing this support.

cc: @wagoodman a.k.a. "The Syft Formats Guru"

ryanmoran commented 2 years ago

We'd be happy to contribute this support. Before we spent the time to do that, it'd be good to have a deeper discussion of the constraints around what the project would and would not want to support.

luhring commented 2 years ago

That sounds great! One option for discussion is that we have biweekly community meetings, where we chat through specific issues that need discussion. Are you available to join one of these? (info here: https://github.com/anchore/syft#join-our-community-meetings)

fg-j commented 2 years ago

After the conversation in the Working Group meeting on 3/31/22, @kzantow and @spiffcs mentioned that it'd be useful to know what syft internal packages I copied to build working implementations of the sbom.Format interface for some legacy SBOM schema.

Note: My implementations only encode and can't decode, since that was expedient for the buildpacks use case. More internal packages may be needed for decoding.

I used the following internal files/packages:

It seems like some of these are more stable than others as the Syft JSON schema itself changes. It be useful if some of these internals were exposed as stable(ish) APIs so that users like Paketo can support SBOM formats beyond what y'all currently offer.

sophiewigmore commented 1 year ago

Hey y'all, it's been a while! 👋 Recently we updated from Syft v0.60.3 to v0.66.3. First of all, the move of the common internal packages to the syft/formats directory has been very helpful for us in reducing some duplication!

We still ran into some issues upgrading, due to the bump of SPDX 2.2 to 2.3, resulting in the need to copy over a few files in order to pin to the old upstream SPDX version.

At this point, we now support all of the "legacy" versions we need, so the issue of upgrading Syft in our code in the future should be much more straightforward. Nevertheless, I wanted to follow up and see if you've given any more thought to the idea of supporting multiple schema versions?

luhring commented 1 year ago

cc @kzantow

kzantow commented 1 year ago

Hi @sophiewigmore. When you say:

due to the bump of SPDX 2.2 to 2.3, resulting in the need to copy over a few files in order to pin to the old upstream SPDX version

Are you only supporting SPDX 2.2 and you only want to output that version?

We recently contributed some changes to the spdx/tools-golang library, which supports multiple versions of SPDX and converting between them (we have yet to incorporate this into Syft but will soon!).

When you refer to multiple schema versions, are you saying that you would want to specify the SPDX version to output? or being able to read any version? or something else?

sophiewigmore commented 1 year ago

@kzantow Hey, we only support 2.2 at the moment, but ideally we will support 2.3 soon as well. I hadn't seen those contributions you mentioned, so I can check them out, it seems like a good option for us! Thanks.

We want to be able to specify the version to output ideally. Essentially, we're after what was laid out in the original issue here for Syft JSON, SPDX, and CycloneDX:

It would be helpful if you could control the Syft JSON schema version used for output, like syft package --output json-v2. I think it would be sufficient to control it at the major version level.

We mostly wanted to check in and see if there's been any thought one way or another on this idea since our preliminary discussions on this topic

kzantow commented 1 year ago

@sophiewigmore yes, we definitely have thoughts on the topic! We're currently working towards Syft 1.0, at which time we plan to make the syft format more stable including support input/output of major versions (with some limited timeframe, presumably). Additionally, as noted for SPDX we plan on supporting output of specific versions once we are able to incorporate the aforementioned changes from the SPDX library we're using.

wagoodman commented 8 months ago

We've since supported being able to specify version for spdx and cyclonedx:

syft --output spdx-json@2.3

I think if we were to do this (and I think we probably should) we should lean into that same approach (e.g. syft --output json@v15.2.1).