Closed idunbarh closed 3 months ago
Adding another SBOM generation tool output for cyclonedx-maven-plugin cyclonedx-maven-plugin.java.cdx.json
Apologies that I'm pretty behind on this analysis. Below is what I found when reviewing the generated BOMs.
I started with looking at CycloneDX BOMs. As a quick initial summary, here are the counts of components for each CycloneDX BOM:
{'cdxgen.jar.cdx.json': 14, 'cdxgen.java.cdx.json': 14, 'cyclonedx-maven-plugin.java.cdx.json': 22, 'syft.java.cdx.json': 931, 'trivy.jar.cdx.json': 403, 'trivy.java.cdx.json': 656}
Note: The syft.jar.cdx.json
and syft.jar.spdx.json
links did not work for me.
I conducted some brief explorations of the various generated BOMs in isolation:
cdxgen and the maven plugin from cyclonedx did not detect many of the dependencies at all. I suspect that they did not traverse the directory tree to detect the subpackages and their respective pom.xml files. I found that they are largely ignorable in the face of syft and trivy and so I moved on quickly.
Moving to syft, I noticed there were a great deal of duplicative purls in the "java" SBOMs. Here are the top 10 duplicated purls:
pkg:maven/org.keycloak/keycloak-core 39
pkg:maven/junit/junit 38
pkg:maven/org.jboss.logging/jboss-logging 31
pkg:maven/org.keycloak/keycloak-common 23
pkg:maven/org.hamcrest/hamcrest 18
pkg:maven/org.keycloak/keycloak-server-spi-private 17
pkg:maven/org.keycloak/keycloak-server-spi 17
pkg:maven/org.keycloak/keycloak-adapter-spi 17
pkg:maven/org.keycloak/keycloak-saml-core 15
pkg:maven/org.apache.httpcomponents/httpclient 15
I found that syft does traverse the entire directory tree, but it doesn't deduplicate the dependencies that it detects. What it does instead is document a key-value pair in the property
field on the CycloneDX BOMs like this:
{
"name": "syft:location:0:path",
"value": "/authz/client/pom.xml"
}
These syft-specific objects document from which file the component was detected. Sadly, almost criminally really, it does not create any dependencies
entries for these components. syft generates an entirely flat component structure.
syft has a hard time dealing with common Java build idioms. This includes syft only capturing version information if it is explicitly stated in the pom. This leads to a great deal of missing versions in the SBOM. It appears incapable of dealing with symbolic/variable values. Occasionally, the build variables will even leak into the syft BOM like this:
"name": "${jdbc.mvn.artifactId}"
One last thing about syft is that it also captures the build actions in the source code:
"name": "./.github/actions/archive-surefire-reports",
I'm uncertain if this is especially helpful or detrimental.
Trivy appears to not traverse the entire source tree. I suspect it only targets the first detected pom. The subpackages are listed as dependencies, such as keycloak-core
, but they are not further reviewed. Trivy does appear to resolve dependencies more than syft however. It commonly has versions for the detected components, even when those components are hidden behind build variables.
I attempted some comparisons between the outputs. syft does appear to generate a great deal more components, but it isn't as much as you might initially think. If you only consider the cardinality of the set of components then you see the duplicates reflect a large majority of the lead. Here are the sizes of the sets of components for each CycloneDX BOM:
purl set counts: {'cdxgen.jar.cdx.json': 14, 'cdxgen.java.cdx.json': 14, 'cyclonedx-maven-plugin.java.cdx.json': 22, 'syft.java.cdx.json': 306, 'trivy.jar.cdx.json': 388, 'trivy.java.cdx.json': 130}
As you can see, the lead has greatly decreased. Looking at set overlap here are the sizes of the set differences:
|syft.java.cdx.json - trivy.jar.cdx.json| = 306
|syft.java.cdx.json - trivy.java.cdx.json| = 305
|trivy.jar.cdx.json - syft.java.cdx.json| = 388
|trivy.jar.cdx.json - trivy.java.cdx.json| = 338
|trivy.java.cdx.json - syft.java.cdx.json| = 129
|trivy.java.cdx.json - trivy.jar.cdx.json| = 80
This would make it seem like the sets are largely disjoint. However, the issue is with the purls. One tool adds a ?type=
to the end and syft doesn't generate versions. If we normalize the purls to strip the type information and the version string, we get:
|syft.java.cdx.json - trivy.jar.cdx.json| = 177
|syft.java.cdx.json - trivy.java.cdx.json| = 207
|trivy.jar.cdx.json - syft.java.cdx.json| = 264
|trivy.jar.cdx.json - trivy.java.cdx.json| = 338
|trivy.java.cdx.json - syft.java.cdx.json| = 32
|trivy.java.cdx.json - trivy.jar.cdx.json| = 76
Which makes things a little closer. From the assumption that trivy does not traverse the source tree, it makes sense that syft has a greater number of components. Looking at the 32 components in syft.java.cdx.json that are not in trivy.java.cdx.json:
pkg:maven/org.keycloak/keycloak-crypto-elytron
pkg:maven/org.keycloak/keycloak-guides
pkg:maven/org.keycloak/keycloak-saml-wildfly-integration-pom
pkg:maven/org.keycloak/keycloak-docs-parent
pkg:maven/org.keycloak/keycloak-client-cli-parent
pkg:maven/org.keycloak.bom/keycloak-bom-parent
pkg:maven/org.keycloak/keycloak-util-parent
pkg:maven/org.keycloak/keycloak-client-adapter-spi-pom
pkg:maven/org.keycloak/keycloak-quarkus-parent
pkg:maven/org.keycloak.bom/keycloak-spi-bom
pkg:maven/org.keycloak.bom/keycloak-misc-bom
pkg:maven/org.keycloak/keycloak-model-pom
pkg:maven/org.keycloak/keycloak-authz-provider-parent
pkg:maven/org.keycloak/keycloak-saml-client-adapter-pom
pkg:maven/org.keycloak.bom/keycloak-adapter-bom
pkg:maven/org.keycloak/keycloak-test-helper
pkg:maven/org.keycloak/keycloak-rest-parent
pkg:maven/org.keycloak/keycloak-js-admin-client
pkg:maven/org.keycloak/keycloak-federation-parent
pkg:maven/org.keycloak/keycloak-js-parent
pkg:maven/org.keycloak/keycloak-ui-shared
pkg:maven/org.keycloak/keycloak-crypto-parent
pkg:maven/org.keycloak/keycloak-oidc-client-adapter-pom
pkg:maven/org.keycloak/keycloak-quarkus-test-parent
pkg:maven/org.keycloak/keycloak-authz-parent
pkg:maven/org.keycloak/keycloak-dependencies-parent
pkg:maven/org.keycloak/keycloak-js-adapter
pkg:maven/org.keycloak/keycloak-integration-parent
pkg:maven/org.keycloak/keycloak-admin-client-jee
pkg:maven/org.keycloak/keycloak-misc-parent
pkg:maven/org.keycloak/keycloak-parent
pkg:maven/org.keycloak/keycloak-adapters-pom
They are all keycloak specific packages. Specifically, these appear to be the names of the top artifact IDs for the various pom.xml files in the source tree. I suspect this is an intentional decision by syft to not include these names as components, while trivy includes them because they are called for in the main pom.xml.
The SPDX BOMs, in general, had the same exact information as the CycloneDX BOMs. The only difference in the equivalent packages
list was a reference to the BOM itself.
@idunbarh - Review Trivy
AND Syft
. Diggity
seems to have captured both java
and js/npm
. I am trying to get hoppctl
to validate for NTIA minimum elements. Do you have anyone in your team who can help me with hopctl validate
. I want to run it by the validation and see what it throws. I have not been successful in installing the hoppr python package on ubuntu. Will give it a shot on Windows this week.
Validation logs with Hoppr/Hopctl (v1.13.0 / Windows / Python3)
hoppr.diggity.cdx.log - Invalid Schema. Component type
invalid.
hoppr.trivy.cdx.java.log - Supplier
and license
information are missing in most of the components.
hoppr.trivy.cdx.jar.log - Supplier
and license
information are missing in most of the components.
hoppr.syft.cdx.java.log - Missing Supplier
, version
and license
for most components.
There is a fundamental difference between the cyclonedx-maven-plugin and the rest here. The Maven plugin works in the context of a source project and is able to record all the metadata and proper dependency graphs, unlike the rest of the tools here that look at deliverables from "the outside". Tools the analyze outcomes of builds don't have access to dependency graphs (relationships between components) and other info. The reason the cyclonedx-maven-plugin is missing many Quarkus components and their dependencies is because it assumes the project adheres to a standard Maven component and dependency model. Quarkus projects are a bit special in that regard and require Quarkus-specific tooling to properly manifest Quarkus applications.
@aloubyansky Good insight around Quarkus! Thank you for adding the context here.
Closing issue and future comments should be on #15.
This issue is to capture the "bakeoff" results from scanning keycloak v25.0.2.
The tools used:
2nd Set - Jul 30th 2024
After reviewing SBOMs during the Tuesday Jul 30th Tiger Team call, the group decided a second set of SBOMS was needed.
This set changes include:
pom.xml
and the releasedjar
filesHere are the specific commands run ...
Here are the generated SBOMs for each of the tools:
1st Set- Jul 25th 2024
This is the initial set of SBOMs generated.
Here are the specific commands run ...
Here are the generated SBOMs for each of the tools: