CycloneDX / cyclonedx-cli

CycloneDX CLI tool for SBOM analysis, merging, diffs and format conversions.
https://cyclonedx.org/
Apache License 2.0
286 stars 60 forks source link

`merge`-d polyglot SBOM loses dependency graph information #179

Open nil4 opened 2 years ago

nil4 commented 2 years ago

Problem overview

CycloneDX tools vary in their support for dependency graph information. For example, cyclonedx-dotnet@0.19.0 supports it, while cyclonedx-node-module does not due to https://github.com/CycloneDX/cyclonedx-node-module/issues/61.

When merging SBOMs in a polyglot project, such that one or more SBOMs have dependency graph information, the output should ideally preserve that, but it currently does not.

To demonstrate this, two projects will be created: one .NET (dependency graph supported) and one NPM (no dependency graph). Their SBOMs, when uploaded individually to Dependency Track, correctly reflect the dependency graph information, if present.

The two SBOMs are then merge-d and the output uploaded. The expectation is that the merged SBOM preserves the input files' dependency graph information, whereas it currently seems to be lost.

Steps to reproduce

Create a simple .NET project and collect its SBOM:

> dotnet --version
5.0.401

> dotnet cyclonedx --version
2.1.2.0

> mkdir src\csharp && pushd src\csharp

src\csharp> dotnet new console
The template "Console Application" was created successfully.

src\csharp> dotnet add package Serilog.Sinks.File 
info : PackageReference for package 'Serilog.Sinks.File' version '5.0.0' added to file 'src\csharp\csharp.csproj'.
info : Writing assets file to disk. Path: src\csharp\obj\project.assets.json
log  : Restored src\csharp\csharp.csproj (in 67 ms).

src\csharp> dotnet cyclonedx -ns -dgl -o .. csharp.csproj 
» Analyzing: src\csharp\csharp.csproj
  Attempting to restore packages
Retrieving Serilog 2.10.0
Retrieving Serilog.Sinks.File 5.0.0

Creating CycloneDX BOM
Writing to: src\bom.xml

Upload src\bom.xml to Dependency Track v4.3.6, confirming that the dependency graph information of the .NET project is present, as expected:

image

Next, create a simple NPM project and collect its SBOM:

> npm --version 
8.0.0

> cyclonedx-bom --version
3.1.1

> mkdir src\js && pushd src\js

src\js> npm init -y
Wrote to src\js\package.json

src\js> npm install jquery --save-dev
added 1 package, and audited 2 packages in 1s
found 0 vulnerabilities

src\js> cyclonedx-bom -ns -d -o ..\bom-js.xml .

Upload src\bom-js.xml to Dependency Track; no dependency graph information is present, and this is expected per https://github.com/CycloneDX/cyclonedx-node-module/issues/61:

image

Finally, merge the individual files into a polyglot SBOM:

> cyclonedx-win-x64 --version
0.19.0

> cyclonedx-win-x64 merge --input-files src\bom.xml src\bom-js.xml --output-file src\bom-polyglot.xml
Processing input file src\bom.xml
    Contains 2 components
Processing input file src\bom-js.xml
    Contains 1 components
Writing output file...
    Total 3 components

Upload src\bom-polyglot.xml to Dependency Track.

Expected results

The dependency graph information present in the input files (e.g. src\bom.xml) is preserved in a merge-d polyglot SBOM.

Observed results

Three components are present (as expected), however the dependency graph information is lost:

image

nil4 commented 2 years ago

Upon further testing, the issue is also apparent when merging SBOMs where each includes dependency graph information.

For example, merge two .NET project SBOMs:

mkdir src\csharp1 && pushd src\csharp1
dotnet new console
dotnet add package Serilog.Sinks.File
dotnet cyclonedx -ns -dgl -o . csharp1.csproj
popd

mkdir src\csharp2 && pushd src\csharp2
dotnet new console
dotnet add package jQuery
dotnet cyclonedx -ns -dgl -o . csharp2.csproj
popd

cyclonedx-win-x64 merge --input-files src\csharp1\bom.xml src\csharp2\bom.xml --output-file src\bom-merged.xml

When bom-merged.xml is uploaded to Dependency Track, the dependency graph is lost:

image

jimklimov commented 1 year ago

Looking at various SBOM JSONs, I think the problem is that the "merged SBOM" has a minimal "metadata/component" object, without a "bom-ref" or "purl" (compared to what I see in "original SBOMs" generated by cyclonedx-maven-plugin in my case; string values seem identical).

Further the "dependencies" list has hordes of entries (with lots of duplicates if my SBOMs were merged from builds of many components based on the same ecosystem - but Dependency-Track eventually deduplicates that after an hour of thinking, at least), but...

The problem is that while in the "original SBOMs" I see this list start with an entry whose "ref" matches the "metadata/component/bom-ref" and it "dependsOn" some immediate components, and from that the tree grows, however in the "merged SBOMs" there is no such entry that I would expect to link the "merged bom-ref" to the previously top-level "bom-refs" of the files that were merged into one.

Actually in my case, I was trying to list the complete BOM of a product release which is a dozen Java services (each in its own JVM, optionally containers) so that dozen references would be the direct dependencies of a "release". And then it would spread out to what further components those services use at which version, and lead to the hundreds of known unique entries over some hops of recursion. Short of writing that "actual" top level manually, I am not sure how the tool could derive that info, but the next-best thing could be to at least depend on the hundred of "our" components (from "original SBOMs" analyzed from our sources) glued into this one file. Ideally it would find tree nodes that nobody depends on, and assume those are the top-level products delivered by our release.

roadSurfer commented 1 year ago

Note: Merges done using cyclonedx-cli image 0.24.2

If I run a heirarchical merge of Springoot JAR (cyclonedx-maven-plugin) and Docker image (Syft, via docker sbom) SBoMs explicitly setting --group FOO --name BAR --version 9.9.9999. In the output metadata I see this:

    <component type="application" bom-ref="FOO.BAR@9.9.9999">
      <group>FOO</group>
      <name>BAR</name>
      <version>9.9.9999</version>
    </component>

And within dependencies I see:

    <dependency ref="FOO.BAR@9.9.9999">
      <dependency ref="registry.domain.local/some-name/my-application:0.0.1-SNAPSHOT@sha256:a5..." />
      <dependency ref="tld.domain.some-name.my-application@0.0.1-SNAPSHOT:pkg:maven/tld.domain.some-name/my-application@0.0.1-SNAPSHOT?type=jar" />
    </dependency>

That seems correct to me, although DepTrack still fails to show the full dependency tree; it's limited to those top two dependencies despite the full dependency information being available within the merged SBoM. (If I upload just the JAR SBoM, I will get the full dependency tree like on would expect.)

If, however, I just do a plain merge of the same files (specifying the Docker image SBoM first), then within metadata I see:

    <component type="container" bom-ref="7fdd0439fe93cdcc">
      <name>registry.domain.local/some-name/my-application:0.0.1-SNAPSHOT</name>
      <version>sha256:a5...</version>
    </component>

Whilst the information from the JAR SBoM is present, there is no corresponding entry in dependencies linking back to metadata/component. The top-level component from the JAR SBoM is also missing, all of which sort of makes sense I guess; this was just a merge of metedata/tools, components and dependencies. I could add the JAR as a component and a dependency to link back to metadata/componet, but when I tried that DepTrack dropped all the components from the Docker image SBoM (actually, it got a lot weirder than that).

Like you I did see some duplicate components after the merges and this seems to be down to subtle differences in how the component was specified in the source SBoMs as it looks like Syft omits type=... from the PURL but does include a package-id= in the bom-ref. Not quite sure how this could be resolved.

So the hierachical merge seems to almost work end-to-end (maybe there is a bug in DepTrack?), but I do find not being able to set publisher etc in metadata/component a bit of a drawback.

My current situation is not as complex as yours, although I can see it going that way as obviously the images get deloyed as part of a larger whole to provide the overall service.

jimklimov commented 1 year ago

FWIW, I followed up on my idea posted a bit above, and manually crafted an SBOM file that:

Then I used cyclonedx merge ... (non-hierarchical -- won't work for me still) with input files listing this crafted SBOM first and a wildcard for others after it - not sure if required (is first file argument special?), but for peace of mind... The generated file lacked a "metadata/component/bom-ref" but had listed the dependencies I crafted in that first file.

Finally I copied that one line ("metadata/component/bom-ref") from the crafted file into the generated one, and uploaded to DT - et voila I have the dependency graph displayed!

roadSurfer commented 1 year ago

You got the full, deep dependency graph and not just the initial level?

jimklimov commented 1 year ago

Did not check about "full", but randomly clicked open at least half a dozen levels.

"So all for the loss of one horse-shoe nail..."

roadSurfer commented 1 year ago

OK, seems like you got much futher. I'll try and follow your steps to see if I can make it work.

roadSurfer commented 1 year ago

Found the problem and I suspect it is a DepTrack limitation. In the merged BOM any sub-components are ignored:

<components>
    <component type="application" bom-ref="tld.domain....">
        <components>
            <!-- These are all ignored in the Dependency Graph -->
            <component type="library" bom-ref="tld.domain....">
        </components>
    </component>
</components>
roadSurfer commented 1 year ago

Thank you, Jim. Manually crafting a "top level" bom with just the components, basic dependency entry, and then doing a "flat" merge seems to solve the Dependency Graph problem. My guess is that this is more a problem of DependencyTrack not correctly following component/component relationships, rather than something to to do with cyclonedx-cli.

jimklimov commented 1 year ago

FWIW one concern I had about this manually crafted approach was how to handle tests and tools (generally entities other than containerized services which may be part of delivery), and figured that this crafted SBOM can in fact untangle that.

So now it declares the same "metadata/component/bom-ref" for the release as it did in earlier iteration ("Project/standard@cloud2022.11.0"), but among the "components" list I declared some nested fake layers like "Project/standard/services@cloud2022.11.0", "Project/standard/tools@cloud2022.11.0" and "Project/standard/tests@cloud2022.11.0" - and rearranged the crafted "dependencies" list to contain not one entry to list all top-level stuff, but in this example four (one to tie the release to these layers, and neatly arranged contents of the three layers).

No idea yet if that would let me e.g. compartmentalize the vulnerability reports (not much worry if integration tests are buggy), but at least this lets not get lost in the dependency tree now that I can see it :)

jimklimov commented 1 year ago

My guess is that this is more a problem of DependencyTrack not correctly following component/component relationships, rather than something to to do with cyclonedx-cli.

I'd vote for cyclonedx-cli being the problem, by not delivering a "bom-ref" that would identify the top-level component described by the document (as well as not delivering any ties from such identifier to further contents). So as far as the dependency tree goes, it has no root and does not exist.

(Maybe it just exposes the problem which is in some library or another component... well, let's shoot the messenger or let it delegate :) )

roadSurfer commented 1 year ago

It won't show a bom-ref in a flat merge, I am not even sure how it can as all a flat merge seems to do is mash components and dependencies together. It does create a bom-ref for a hierachical merge, but I know that fails for you which would appear to be a separate bug. Not following sub-components/transitives for the Dependency Graph would seem to be Dependency-Track issue 1513.

Of course, I could be totally wrong (I often am 😄).

jimklimov commented 1 year ago

As I wrote above - I too do not have a good idea how it could line the "metadata/component" of the currently described document to its further contents. This component is an artificial construct, to the extent that it is not even identified (which IMHO smells fishy).

For the purposes of import/export via DT (e.g. to deduplicate smartly) and not lose the components because they are not referenced hence "do not exist", linking from this merged bom-ref to each component that nobody else depends on could be a way. I don't know - maybe that is what "hierarchical" does in fact, can't check with my BOMs :)

roadSurfer commented 1 year ago

I attached some example BOMs to 1513. For the problems with your BOMs, I guess your only real option is to try and see which one(s) are a problem and maybe log an issue(s); or find a way to script creating your top-level BOM. Good luck!