containerbuildsystem / cachi2

Cachi2 is a CLI tool that pre-fetches your project's dependencies to aid in making your build process network-isolated.
GNU General Public License v3.0
8 stars 26 forks source link

Add SPDX format support for SBOM #608

Open midnightercz opened 2 months ago

midnightercz commented 2 months ago

Support for SPDX format was added to fetch-depds command and also to merge_syft_sboms. No changes were made in particular package manager generating components which are then converted to cyclonedx format. SPDX sbom can be obtained by calling Sbom.to_spdx(). New switch sbom-type was added to merge_syfy_sboms, so user can choose which output format should be generated - default is cyclonedx. Once all tooling is ready to consume spdx sboms, cutoff changes in this repository can be started.

Maintainers will complete the following section

Note: if the contribution is external (not from an organization member), the CI pipeline will not run automatically. After verifying that the CI is safe to run:

brunoapimentel commented 2 months ago

Just a quick heads-up: the utils/merge_syft_sbom.py script was removed from this repo and now lives here. So you'll need to propose the related changes there instead.

midnightercz commented 2 months ago

Just a quick heads-up: the utils/merge_syft_sbom.py script was removed from this repo and now lives here. So you'll need to propose the related changes there instead.

I thought I will eventually integrate the changes into new subcommand implemented here: https://github.com/containerbuildsystem/cachi2/pull/593 but it wasn't merged yet when I started working on this one

MartinBasti commented 2 months ago

typo in commit message and PR: s/SDPX/SPDX/

midnightercz commented 2 months ago

Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something?

a-ovchinnikov commented 1 month ago

Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something?

merge-sboms is a thin wrapper around an internal function that is tested elsewhere. It does not do much except some validation of input and then invoking a method for saving merge results (which is also tested elsewhere). There is not much value in doing this sort of test.

MartinBasti commented 1 month ago

and also please squash commits

midnightercz commented 4 weeks ago

I added 3 more commits, two do address fresh comments a-ovchinnikov and last one is to add middle layer between SPDX-Document and packages to produce result similar to syft. There is also need to implement conversion from spdx -> cyclonedx and support for merging files of different formats (as we agreed on yesterday), but I would like to do that in new PR

brunoapimentel commented 4 weeks ago

There is also need to implement conversion from spdx -> cyclonedx and support for merging files of different formats (as we agreed on yesterday), but I would like to do that in new PR

I agree we can push this to a follow up PR. But our current behavior seems to be inconsistent: if we provide files from a different format from the output, only an empty manifest is generated. To illustrate:

$ cachi2 merge-sboms gomod-spdx.json pip-spdx.json --sbom-type cyclonedx

{
  "bomFormat": "CycloneDX",
  "components": [],
  "metadata": {
    "tools": [
      {
        "vendor": "red hat",
        "name": "cachi2"
      }
    ]
  },
  "specVersion": "1.4",
  "version": 1
}

I think we could simply throw an error saying that Cachi2 does not support merging SPDX SBOMs yet, and work on that as a follow up.

brunoapimentel commented 4 weeks ago

There is also need to implement conversion from spdx -> cyclonedx and support for merging files of different formats (as we agreed on yesterday), but I would like to do that in new PR

I missed this discussion, but I'll leave my 2 cents here: I'd only implement support for merging SBOMs of the same type, and fail on everything else. The merge-sboms feature was included to cover a very specific use case, and I don't believe we benefit much from making it very complex. Converting formats is something I really don't think we need right now.

But I'm saying this from the "keep it simple" perspective, I have nothing against these proposed features. They are nice to have, but probably not important right now.

midnightercz commented 3 weeks ago

There is also need to implement conversion from spdx -> cyclonedx and support for merging files of different formats (as we agreed on yesterday), but I would like to do that in new PR

I agree we can push this to a follow up PR. But our current behavior seems to be inconsistent: if we provide files from a different format from the output, only an empty manifest is generated. To illustrate:

cachi2 merge-sboms gomod-spdx.json pip-spdx.json --sbom-type cyclonedx

I think we could simply throw an error saying that Cachi2 does not support merging SPDX SBOMs yet, and work on that as a follow up.

That's because how you created your Sbom model. CycloneDX (class Sbom) can be basically empty and because bom_format field has default value set, it doesn't even fail there if the field isn't present in the input and extra fields are ignored. I think we need to add model_config = ConfigDict(extra="forbid") to the sbom models

I forgot to send this comment yesterday. I included the needed changes for better parsing + also support for spdx->cyclonedx (as I already had those changes ready)

midnightercz commented 3 weeks ago

I believe we can improve the way CycloneDX's properties are being translated to SPDX. For example:

{
  "name":  "cachi2:found_by", 
  "value": "cachi2"
}

Could be turned into:

{
  "annotator": "cachi2",
  "annotationDate": "2024-10-01T11:03:40.799221",
  "annotationType": "OTHER",
  "comment": "found_by:cachi2"
}

My reasoning is that the annotator field already states that Cachi2 is the author of that annotation, so we can drop the prefix cachi2:, which serves the same purpose. Then, we could just append the value in the end of the string, instead of having the inline JSON, which is arguably harder to read.

If folks like the idea, this can be done as a follow up, no need to block this PR because of this.

I get your idea, but what if you decide to use new properties? (Our) Tooling reading SPDX annotations could depend on rule if annotator is cachi2 then comment is string in json format. Then if you add property {"name": "foo", "value": "bar"}, with your proposal it would turn into {"annotator": "bar", "comment": "foo"}. I think approach to dump whole property in json is more universal

brunoapimentel commented 3 weeks ago

I get your idea, but what if you decide to use new properties? (Our) Tooling reading SPDX annotations could depend on rule if annotator is cachi2 then comment is string in json format. Then if you add property {"name": "foo", "value": "bar"}, with your proposal it would turn into {"annotator": "bar", "comment": "foo"}. I think approach to dump whole property in json is more universal

To clarify, my proposal is to omit the prefix we add to every Cachi2-generated property, since the annotator is already Cachi2. So it's not the value that becomes the annotator, as you mentioned above. The second part of the proposal is to remove the json formatting, since it does not add anything to the info.

{
    "name": "prefix:property"
    "value":  "content"
}
{
    "annotator": "prefix"
    "comment":  "property:content"
}

This should only apply to Cachi2-generated propertis, though. It is all we deal with for now.

midnightercz commented 3 weeks ago

I get your idea, but what if you decide to use new properties? (Our) Tooling reading SPDX annotations could depend on rule if annotator is cachi2 then comment is string in json format. Then if you add property {"name": "foo", "value": "bar"}, with your proposal it would turn into {"annotator": "bar", "comment": "foo"}. I think approach to dump whole property in json is more universal

To clarify, my proposal is to omit the prefix we add to every Cachi2-generated property, since the annotator is already Cachi2. So it's not the value that becomes the annotator, as you mentioned above. The second part of the proposal is to remove the json formatting, since it does not add anything to the info.

{
    "name": "prefix:property"
    "value":  "content"
}
{
    "annotator": "prefix"
    "comment":  "property:content"
}

This should only apply to Cachi2-generated propertis, though. It is all we deal with for now.

There's property Property(name="cachi2:missing_hash:in_file", value=str(lockfile_path)) in rpm module. How should that then be translated to annotation?

brunoapimentel commented 3 weeks ago

There's property Property(name="cachi2:missing_hash:in_file", value=str(lockfile_path)) in rpm module. How should that then be translated to annotation?

My initial thought was:

{
    "annotator": "cachi2"
    "comment":  "missing_hash:in_file:./path/to/the/file"
}

But seeing the other thread you're keeping with Lui made me realize that Cachi2 adds standard CycloneDx properties (such as cdx:npm:package:bundled). So, cachi2: was actually a way to namespace our own custom properties, and I believe it should be kept.

I still don't quite like the inline JSON, and also that it has name and value that are part of the CycloneDX spec (it could be simplified to {"cachi2:missing_hash:in_file": "/path/to/the/file"}). I'll open a discussion about this later, as I said, I don't want this to block this PR.

I also saw that SPDX 3.0 adds a ContentType field to annotations, which we could leverage to state that it contains JSON when we move to the new version.

midnightercz commented 2 weeks ago

I still don't quite like the inline JSON

I agree, but on the other hand, json dumps will handle any escaping, so as property name and value you can have anything which is compatible with cyclonedx property

midnightercz commented 2 weeks ago

It looks like the scope of this PR keeps growing. It might make sense splitting it into several PRs.

This is the last change. It won't grow more