CycloneDX / cyclonedx-maven-plugin

Creates CycloneDX Software Bill of Materials (SBOM) from Maven projects
https://cyclonedx.org/
Apache License 2.0
283 stars 83 forks source link

Import transitive dependencies from SBOMs if available #497

Open ppkarwasz opened 2 months ago

ppkarwasz commented 2 months ago

Some Maven libraries publish shaded artifacts that contain many if not all their dependencies.

Since it is impossible to guess which artifacts were shaded from the POM file alone, the CycloneDX plugin should try to use the CycloneDX SBOMs of their dependencies, if available.

This feature request is related to #472 .

VinodAnandan commented 2 months ago

@ppkarwasz Thanks for creating this issue. I had a similar idea that I discussed with @hboutemy in CycloneDX Slack, but I failed to provide him with concrete examples. Is it possible for you or anyone else (Cc: @raboof , @prabhu, @lfrancke, @stevespringett ) interested in this feature to provide some concrete examples?

Screenshot 2024-05-15 at 02 20 24
lfrancke commented 2 months ago

Examples of where this would be useful?

ppkarwasz commented 2 months ago

In many cases shaded artifacts are the final product and are not consumed by other Java artifacts. They end up in the binary tar.gz distribution of an application, so they are not a problem for CycloneDX Maven plugin.

There are however valid (or at least justified) cases, when a library shades another and often repackages it (in the sense that it changes the names of Java packages):

I consider SBOMs as an build tool and language independent way to expose a project's dependencies. It would be useful to use them to complement Maven's simplified dependency system, e.g. regarding conflicts.

raboof commented 2 months ago

As scenario:

Imagine:

When running the vulnerability scanner, it should identify that p is potentially affected by the advisory for a. There are two approaches the vulnerability scanner could learn about the fact that a is part of p:

So the choice is between going implementing this in all vulnerability scanners (first approach), or implementing this in all SBOM generators (including cyclonedx-maven-plugin, second approach). AFAICS there is no obvious 'architectural' reason to choose one or the other. For 'regular' dependencies, you definitely want the second approach (because the pom of p may influence which transitive dependencies of d would get picked, so looking at d's SBOM would not be accurate for these). For 'shaded' dependencies, either approach would work. The fact that you want the second approach for 'regular' dependencies might be a motivation to go for the second approach and implement this in SBOM generators such as cyclonedx-maven-plugin.

prabhu commented 2 months ago

For cases like these, we need to go beyond the package names to a vulnerability database that offer affected modules, imports, symbols, etc, which doesn't exist in the open-source world. When running cdxgen with --deep argument, the Namespaces belonging to each package would also get collected and stored as an internal property, so some work on the SBOM side is possible.

hboutemy commented 2 months ago

to the examples of shaded content shared previously, I'd add one typical case: in the same gav, there are both the initial .jar and one shaded one, like https://repo1.maven.org/maven2/org/apache/maven/wagon/wagon-http/3.5.3/

on this case, what should THE sbom contain to describe the 2 different jars? how would a project consuming one of these jars as a dependency know what to use? Additional question: as wagon project is a multi-module build, what about the aggregate SBOM vs the gav-only ones? And this question about aggregated is valid both from a producer perspective (wagon) and a consumer perspective (a project consuming one artifact of wagon)?

has really cyclonedx-maven-plugin a chance to magically detect different case without user deep configuration? How many additional files will have the plugin to download to do the advanced analysis?

notice: is this specific to the java world or do other ecosystems have such cases?

there are serious deep dives discussion to have to get the whole picture

raboof commented 2 months ago

in the same gav, there are both the initial .jar and one shaded one, like https://repo1.maven.org/maven2/org/apache/maven/wagon/wagon-http/3.5.3/

on this case, what should THE sbom contain to describe the 2 different jars?

Even though those are in the same gav, shouldn't we treat those jars as different artifacts and thus create different SBOMs for them?

there are serious deep dives discussion to have to get the whole picture

Indeed!

ppkarwasz commented 2 months ago

on this case, what should THE sbom contain to describe the 2 different jars? how would a project consuming one of these jars as a dependency know what to use?

I think that the SBOM should describe all the artifacts sharing the same GAV (at least the binary ones). Some will be described as components, while other as assemblies. The classifier and type property of a pURL should be enough to make them apart.

A complex example, jakartaee-migration has 3 assemblies:

BTW: I think that if VEX-es become compulsory, developers will think twice before publishing this kind of assemblies. jakartaee-migration contains commons-compress and is vulnerable to all its CVEs.