Open midnightercz opened 2 months ago
Just a quick heads-up: the
utils/merge_syft_sbom.py
script was removed from this repo and now lives here. So you'll need to propose the related changes there instead.
I thought I will eventually integrate the changes into new subcommand implemented here: https://github.com/containerbuildsystem/cachi2/pull/593 but it wasn't merged yet when I started working on this one
typo in commit message and PR: s/SDPX/SPDX/
Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something?
Thanks for all the reviews and comments. I also wanted to add spdx support into merge-sboms subcommand but I noticed that in the tests you don't really test merged output. Or am I missing something?
merge-sboms
is a thin wrapper around an internal function that is tested elsewhere. It does not do much except some validation of input and then invoking a method for saving merge results (which is also tested elsewhere). There is not much value in doing this sort of test.
and also please squash commits
I added 3 more commits, two do address fresh comments a-ovchinnikov and last one is to add middle layer between SPDX-Document and packages to produce result similar to syft. There is also need to implement conversion from spdx -> cyclonedx and support for merging files of different formats (as we agreed on yesterday), but I would like to do that in new PR
There is also need to implement conversion from spdx -> cyclonedx and support for merging files of different formats (as we agreed on yesterday), but I would like to do that in new PR
I agree we can push this to a follow up PR. But our current behavior seems to be inconsistent: if we provide files from a different format from the output, only an empty manifest is generated. To illustrate:
$ cachi2 merge-sboms gomod-spdx.json pip-spdx.json --sbom-type cyclonedx
{
"bomFormat": "CycloneDX",
"components": [],
"metadata": {
"tools": [
{
"vendor": "red hat",
"name": "cachi2"
}
]
},
"specVersion": "1.4",
"version": 1
}
I think we could simply throw an error saying that Cachi2 does not support merging SPDX SBOMs yet, and work on that as a follow up.
There is also need to implement conversion from spdx -> cyclonedx and support for merging files of different formats (as we agreed on yesterday), but I would like to do that in new PR
I missed this discussion, but I'll leave my 2 cents here: I'd only implement support for merging SBOMs of the same type, and fail on everything else. The merge-sboms
feature was included to cover a very specific use case, and I don't believe we benefit much from making it very complex. Converting formats is something I really don't think we need right now.
But I'm saying this from the "keep it simple" perspective, I have nothing against these proposed features. They are nice to have, but probably not important right now.
There is also need to implement conversion from spdx -> cyclonedx and support for merging files of different formats (as we agreed on yesterday), but I would like to do that in new PR
I agree we can push this to a follow up PR. But our current behavior seems to be inconsistent: if we provide files from a different format from the output, only an empty manifest is generated. To illustrate:
cachi2 merge-sboms gomod-spdx.json pip-spdx.json --sbom-type cyclonedx
I think we could simply throw an error saying that Cachi2 does not support merging SPDX SBOMs yet, and work on that as a follow up.
That's because how you created your Sbom model. CycloneDX (class Sbom) can be basically empty and because bom_format field has default value set, it doesn't even fail there if the field isn't present in the input and extra fields are ignored.
I think we need to add
model_config = ConfigDict(extra="forbid")
to the sbom models
I forgot to send this comment yesterday. I included the needed changes for better parsing + also support for spdx->cyclonedx (as I already had those changes ready)
I believe we can improve the way CycloneDX's properties are being translated to SPDX. For example:
{ "name": "cachi2:found_by", "value": "cachi2" }
Could be turned into:
{ "annotator": "cachi2", "annotationDate": "2024-10-01T11:03:40.799221", "annotationType": "OTHER", "comment": "found_by:cachi2" }
My reasoning is that the
annotator
field already states that Cachi2 is the author of that annotation, so we can drop the prefixcachi2:
, which serves the same purpose. Then, we could just append the value in the end of the string, instead of having the inline JSON, which is arguably harder to read.If folks like the idea, this can be done as a follow up, no need to block this PR because of this.
I get your idea, but what if you decide to use new properties? (Our) Tooling reading SPDX annotations could depend on rule if annotator is cachi2 then comment is string in json format. Then if you add property {"name": "foo", "value": "bar"}
, with your proposal it would turn into {"annotator": "bar", "comment": "foo"}
. I think approach to dump whole property in json is more universal
I get your idea, but what if you decide to use new properties? (Our) Tooling reading SPDX annotations could depend on rule if annotator is cachi2 then comment is string in json format. Then if you add property
{"name": "foo", "value": "bar"}
, with your proposal it would turn into{"annotator": "bar", "comment": "foo"}
. I think approach to dump whole property in json is more universal
To clarify, my proposal is to omit the prefix we add to every Cachi2-generated property, since the annotator is already Cachi2. So it's not the value that becomes the annotator, as you mentioned above. The second part of the proposal is to remove the json formatting, since it does not add anything to the info.
{
"name": "prefix:property"
"value": "content"
}
{
"annotator": "prefix"
"comment": "property:content"
}
This should only apply to Cachi2-generated propertis, though. It is all we deal with for now.
I get your idea, but what if you decide to use new properties? (Our) Tooling reading SPDX annotations could depend on rule if annotator is cachi2 then comment is string in json format. Then if you add property
{"name": "foo", "value": "bar"}
, with your proposal it would turn into{"annotator": "bar", "comment": "foo"}
. I think approach to dump whole property in json is more universalTo clarify, my proposal is to omit the prefix we add to every Cachi2-generated property, since the annotator is already Cachi2. So it's not the value that becomes the annotator, as you mentioned above. The second part of the proposal is to remove the json formatting, since it does not add anything to the info.
{ "name": "prefix:property" "value": "content" }
{ "annotator": "prefix" "comment": "property:content" }
This should only apply to Cachi2-generated propertis, though. It is all we deal with for now.
There's property Property(name="cachi2:missing_hash:in_file", value=str(lockfile_path)) in rpm module. How should that then be translated to annotation?
There's property Property(name="cachi2:missing_hash:in_file", value=str(lockfile_path)) in rpm module. How should that then be translated to annotation?
My initial thought was:
{
"annotator": "cachi2"
"comment": "missing_hash:in_file:./path/to/the/file"
}
But seeing the other thread you're keeping with Lui made me realize that Cachi2 adds standard CycloneDx properties (such as cdx:npm:package:bundled
). So, cachi2:
was actually a way to namespace our own custom properties, and I believe it should be kept.
I still don't quite like the inline JSON, and also that it has name
and value
that are part of the CycloneDX spec (it could be simplified to {"cachi2:missing_hash:in_file": "/path/to/the/file"}
). I'll open a discussion about this later, as I said, I don't want this to block this PR.
I also saw that SPDX 3.0 adds a ContentType
field to annotations, which we could leverage to state that it contains JSON when we move to the new version.
I still don't quite like the inline JSON
I agree, but on the other hand, json dumps will handle any escaping, so as property name and value you can have anything which is compatible with cyclonedx property
It looks like the scope of this PR keeps growing. It might make sense splitting it into several PRs.
This is the last change. It won't grow more
Support for SPDX format was added to fetch-depds command and also to merge_syft_sboms. No changes were made in particular package manager generating components which are then converted to cyclonedx format. SPDX sbom can be obtained by calling Sbom.to_spdx(). New switch sbom-type was added to merge_syfy_sboms, so user can choose which output format should be generated - default is cyclonedx. Once all tooling is ready to consume spdx sboms, cutoff changes in this repository can be started.
Maintainers will complete the following section
Note: if the contribution is external (not from an organization member), the CI pipeline will not run automatically. After verifying that the CI is safe to run:
/ok-to-test
(as is the standard for Pipelines as Code)