CISA-SBOM-Community / SBOM-Generation

Reference GitHub Workflows for SBOM generation from the CISA SBOM Generation Reference Implementation Tiger Team
Apache License 2.0
16 stars 5 forks source link

SBOM Generation Output Comparison for Java Projects #7

Closed idunbarh closed 3 months ago

idunbarh commented 4 months ago

This issue is to capture the "bakeoff" results from scanning keycloak v25.0.2.

The tools used:

2nd Set - Jul 30th 2024

After reviewing SBOMs during the Tuesday Jul 30th Tiger Team call, the group decided a second set of SBOMS was needed.

This set changes include:

Here are the specific commands run ...

trivy fs keycloak-source/pom.xml   --skip-update --offline-scan --format cyclonedx --output sboms/trivy.java.cdx.json
trivy fs keycloak-source/pom.xml   --skip-update --offline-scan --format spdx-json --output sboms/trivy.java.spdx.json

trivy rootfs keycloak-build   --skip-update --offline-scan --format cyclonedx --output sboms/trivy.jar.cdx.json
trivy rootfs keycloak-build   --skip-update --offline-scan --format spdx-json --output sboms/trivy.jar.spdx.json

syft keycloak-build                               -o cyclonedx-json=sboms/syft.jar.cdx.json -o spdx-json=sboms/syft.jar.spdx.json
syft keycloak-source --exclude "./pnpm-lock.yaml" -o cyclonedx-json=sboms/syft.java.cdx.json -o spdx-json=sboms/syft.java.spdx.json

cdxgen keycloak-source -t jar  -o sboms/cdxgen.jar.cdx.json
cdxgen keycloak-source -t java -o sboms/cdxgen.java.cdx.json

$HOME/go/bin/bom generate keycloak-source -o pretty_sboms/bom.spdx.txt

Here are the generated SBOMs for each of the tools:

1st Set- Jul 25th 2024

This is the initial set of SBOMs generated.

Here are the specific commands run ...

syft keycloak -o cyclonedx-json=sboms/syft.cdx.json -o spdx-json=sboms/syft.spdx.json

trivy fs keycloak --skip-update --offline-scan --format cyclonedx --output sboms/trivy.cdx.json
trivy fs keycloak --skip-update --offline-scan --format spdx-json --output sboms/trivy.spdx.json

cdxgen keycloak -t java -o sboms/cdxgen.cdx.json

diggity -d keycloak -o cdx-json > diggity.cdx.json
diggity -d keycloak -o spdx-json > diggity.spdx.json

spdx-sbom-generator -p keycloak

Here are the generated SBOMs for each of the tools:

idunbarh commented 3 months ago

Adding another SBOM generation tool output for cyclonedx-maven-plugin cyclonedx-maven-plugin.java.cdx.json

douglasdennis commented 3 months ago

Apologies that I'm pretty behind on this analysis. Below is what I found when reviewing the generated BOMs.

I started with looking at CycloneDX BOMs. As a quick initial summary, here are the counts of components for each CycloneDX BOM:

{'cdxgen.jar.cdx.json': 14, 'cdxgen.java.cdx.json': 14, 'cyclonedx-maven-plugin.java.cdx.json': 22, 'syft.java.cdx.json': 931, 'trivy.jar.cdx.json': 403, 'trivy.java.cdx.json': 656}

Note: The syft.jar.cdx.json and syft.jar.spdx.json links did not work for me.

I conducted some brief explorations of the various generated BOMs in isolation:

cdxgen & cyclonedx-maven-plugin.java.cdx.json

cdxgen and the maven plugin from cyclonedx did not detect many of the dependencies at all. I suspect that they did not traverse the directory tree to detect the subpackages and their respective pom.xml files. I found that they are largely ignorable in the face of syft and trivy and so I moved on quickly.

syft

Moving to syft, I noticed there were a great deal of duplicative purls in the "java" SBOMs. Here are the top 10 duplicated purls:

pkg:maven/org.keycloak/keycloak-core 39
pkg:maven/junit/junit 38
pkg:maven/org.jboss.logging/jboss-logging 31
pkg:maven/org.keycloak/keycloak-common 23
pkg:maven/org.hamcrest/hamcrest 18
pkg:maven/org.keycloak/keycloak-server-spi-private 17
pkg:maven/org.keycloak/keycloak-server-spi 17
pkg:maven/org.keycloak/keycloak-adapter-spi 17
pkg:maven/org.keycloak/keycloak-saml-core 15
pkg:maven/org.apache.httpcomponents/httpclient 15

I found that syft does traverse the entire directory tree, but it doesn't deduplicate the dependencies that it detects. What it does instead is document a key-value pair in the property field on the CycloneDX BOMs like this:

{
  "name": "syft:location:0:path",
  "value": "/authz/client/pom.xml"
}

These syft-specific objects document from which file the component was detected. Sadly, almost criminally really, it does not create any dependencies entries for these components. syft generates an entirely flat component structure.

syft has a hard time dealing with common Java build idioms. This includes syft only capturing version information if it is explicitly stated in the pom. This leads to a great deal of missing versions in the SBOM. It appears incapable of dealing with symbolic/variable values. Occasionally, the build variables will even leak into the syft BOM like this:

"name": "${jdbc.mvn.artifactId}"

One last thing about syft is that it also captures the build actions in the source code:

"name": "./.github/actions/archive-surefire-reports",

I'm uncertain if this is especially helpful or detrimental.

trivy

Trivy appears to not traverse the entire source tree. I suspect it only targets the first detected pom. The subpackages are listed as dependencies, such as keycloak-core, but they are not further reviewed. Trivy does appear to resolve dependencies more than syft however. It commonly has versions for the detected components, even when those components are hidden behind build variables.

Comparison Between the Tools

I attempted some comparisons between the outputs. syft does appear to generate a great deal more components, but it isn't as much as you might initially think. If you only consider the cardinality of the set of components then you see the duplicates reflect a large majority of the lead. Here are the sizes of the sets of components for each CycloneDX BOM:

purl set counts:  {'cdxgen.jar.cdx.json': 14, 'cdxgen.java.cdx.json': 14, 'cyclonedx-maven-plugin.java.cdx.json': 22, 'syft.java.cdx.json': 306, 'trivy.jar.cdx.json': 388, 'trivy.java.cdx.json': 130}

As you can see, the lead has greatly decreased. Looking at set overlap here are the sizes of the set differences:

|syft.java.cdx.json - trivy.jar.cdx.json| = 306
|syft.java.cdx.json - trivy.java.cdx.json| = 305
|trivy.jar.cdx.json - syft.java.cdx.json| = 388
|trivy.jar.cdx.json - trivy.java.cdx.json| = 338
|trivy.java.cdx.json - syft.java.cdx.json| = 129
|trivy.java.cdx.json - trivy.jar.cdx.json| = 80

This would make it seem like the sets are largely disjoint. However, the issue is with the purls. One tool adds a ?type= to the end and syft doesn't generate versions. If we normalize the purls to strip the type information and the version string, we get:

|syft.java.cdx.json - trivy.jar.cdx.json| = 177
|syft.java.cdx.json - trivy.java.cdx.json| = 207
|trivy.jar.cdx.json - syft.java.cdx.json| = 264
|trivy.jar.cdx.json - trivy.java.cdx.json| = 338
|trivy.java.cdx.json - syft.java.cdx.json| = 32
|trivy.java.cdx.json - trivy.jar.cdx.json| = 76

Which makes things a little closer. From the assumption that trivy does not traverse the source tree, it makes sense that syft has a greater number of components. Looking at the 32 components in syft.java.cdx.json that are not in trivy.java.cdx.json:

pkg:maven/org.keycloak/keycloak-crypto-elytron
pkg:maven/org.keycloak/keycloak-guides
pkg:maven/org.keycloak/keycloak-saml-wildfly-integration-pom
pkg:maven/org.keycloak/keycloak-docs-parent
pkg:maven/org.keycloak/keycloak-client-cli-parent
pkg:maven/org.keycloak.bom/keycloak-bom-parent
pkg:maven/org.keycloak/keycloak-util-parent
pkg:maven/org.keycloak/keycloak-client-adapter-spi-pom
pkg:maven/org.keycloak/keycloak-quarkus-parent
pkg:maven/org.keycloak.bom/keycloak-spi-bom
pkg:maven/org.keycloak.bom/keycloak-misc-bom
pkg:maven/org.keycloak/keycloak-model-pom
pkg:maven/org.keycloak/keycloak-authz-provider-parent
pkg:maven/org.keycloak/keycloak-saml-client-adapter-pom
pkg:maven/org.keycloak.bom/keycloak-adapter-bom
pkg:maven/org.keycloak/keycloak-test-helper
pkg:maven/org.keycloak/keycloak-rest-parent
pkg:maven/org.keycloak/keycloak-js-admin-client
pkg:maven/org.keycloak/keycloak-federation-parent
pkg:maven/org.keycloak/keycloak-js-parent
pkg:maven/org.keycloak/keycloak-ui-shared
pkg:maven/org.keycloak/keycloak-crypto-parent
pkg:maven/org.keycloak/keycloak-oidc-client-adapter-pom
pkg:maven/org.keycloak/keycloak-quarkus-test-parent
pkg:maven/org.keycloak/keycloak-authz-parent
pkg:maven/org.keycloak/keycloak-dependencies-parent
pkg:maven/org.keycloak/keycloak-js-adapter
pkg:maven/org.keycloak/keycloak-integration-parent
pkg:maven/org.keycloak/keycloak-admin-client-jee
pkg:maven/org.keycloak/keycloak-misc-parent
pkg:maven/org.keycloak/keycloak-parent
pkg:maven/org.keycloak/keycloak-adapters-pom

They are all keycloak specific packages. Specifically, these appear to be the names of the top artifact IDs for the various pom.xml files in the source tree. I suspect this is an intentional decision by syft to not include these names as components, while trivy includes them because they are called for in the main pom.xml.

SPDX BOMs

The SPDX BOMs, in general, had the same exact information as the CycloneDX BOMs. The only difference in the equivalent packages list was a reference to the BOM itself.

dasarpjonam commented 3 months ago

@idunbarh - Review Trivy AND Syft. Diggity seems to have captured both java and js/npm. I am trying to get hoppctl to validate for NTIA minimum elements. Do you have anyone in your team who can help me with hopctl validate. I want to run it by the validation and see what it throws. I have not been successful in installing the hoppr python package on ubuntu. Will give it a shot on Windows this week.

dasarpjonam commented 3 months ago

Validation logs with Hoppr/Hopctl (v1.13.0 / Windows / Python3)

hoppr.diggity.cdx.log - Invalid Schema. Component type invalid. hoppr.trivy.cdx.java.log - Supplier and license information are missing in most of the components. hoppr.trivy.cdx.jar.log - Supplier and license information are missing in most of the components. hoppr.syft.cdx.java.log - Missing Supplier, version and license for most components.

aloubyansky commented 3 months ago

There is a fundamental difference between the cyclonedx-maven-plugin and the rest here. The Maven plugin works in the context of a source project and is able to record all the metadata and proper dependency graphs, unlike the rest of the tools here that look at deliverables from "the outside". Tools the analyze outcomes of builds don't have access to dependency graphs (relationships between components) and other info. The reason the cyclonedx-maven-plugin is missing many Quarkus components and their dependencies is because it assumes the project adheres to a standard Maven component and dependency model. Quarkus projects are a bit special in that regard and require Quarkus-specific tooling to properly manifest Quarkus applications.

idunbarh commented 3 months ago

@aloubyansky Good insight around Quarkus! Thank you for adding the context here.

idunbarh commented 3 months ago

Closing issue and future comments should be on #15.