Open ppkarwasz opened 2 months ago
@ppkarwasz Thanks for creating this issue. I had a similar idea that I discussed with @hboutemy in CycloneDX Slack, but I failed to provide him with concrete examples. Is it possible for you or anyone else (Cc: @raboof , @prabhu, @lfrancke, @stevespringett ) interested in this feature to provide some concrete examples?
Examples of where this would be useful?
In many cases shaded artifacts are the final product and are not consumed by other Java artifacts.
They end up in the binary tar.gz
distribution of an application, so they are not a problem for CycloneDX Maven plugin.
There are however valid (or at least justified) cases, when a library shades another and often repackages it (in the sense that it changes the names of Java packages):
pax-logging-log4j
shades log4j-core
and makes minor modifications to improve its OSGi support. This case is very unfortunate since versions of pax-logging-log4j2
prior to 2.0.13 are affected by at least one of the Log4Shell CVE's.tomcat-dbcp
is a repackaged version of commons-dbcp2
with the logging API replaced with tomcat-juli
(which itself is a repackaged version of an old commons-logging
).bouncy-castle
artifacts might be considered a "shaded" version of another BC artifact.I consider SBOMs as an build tool and language independent way to expose a project's dependencies. It would be useful to use them to complement Maven's simplified dependency system, e.g. regarding conflicts.
As scenario:
Imagine:
p
and its parts.d
) that shades artifact a
: the d
jar contains all classes from a
, but moved to a different packaged
publishes an SBOM that correctly reports the fact that d
contains a
(e.g. through #472 or otherwise)a
p
is created with cyclonedx-maven-plugin
When running the vulnerability scanner, it should identify that p
is potentially affected by the advisory for a
. There are two approaches the vulnerability scanner could learn about the fact that a
is part of p
:
p
on d
, fetches the SBOM for d
, and finds out about the shaded a
from therecyclonedx-maven-plugin
sees the dependency of p
on d
, fetches the SBOM for d
, and uses this information to include a
into the SBOM for p
(i.e., the feature described in this issue). The vulnerability scanner then takes this information from the SBOM of p
.So the choice is between going implementing this in all vulnerability scanners (first approach), or implementing this in all SBOM generators (including cyclonedx-maven-plugin
, second approach). AFAICS there is no obvious 'architectural' reason to choose one or the other. For 'regular' dependencies, you definitely want the second approach (because the pom of p
may influence which transitive dependencies of d
would get picked, so looking at d
's SBOM would not be accurate for these). For 'shaded' dependencies, either approach would work. The fact that you want the second approach for 'regular' dependencies might be a motivation to go for the second approach and implement this in SBOM generators such as cyclonedx-maven-plugin
.
For cases like these, we need to go beyond the package names to a vulnerability database that offer affected modules, imports, symbols, etc, which doesn't exist in the open-source world. When running cdxgen with --deep
argument, the Namespaces
belonging to each package would also get collected and stored as an internal property, so some work on the SBOM side is possible.
to the examples of shaded content shared previously, I'd add one typical case: in the same gav, there are both the initial .jar and one shaded one, like https://repo1.maven.org/maven2/org/apache/maven/wagon/wagon-http/3.5.3/
on this case, what should THE sbom contain to describe the 2 different jars? how would a project consuming one of these jars as a dependency know what to use? Additional question: as wagon project is a multi-module build, what about the aggregate SBOM vs the gav-only ones? And this question about aggregated is valid both from a producer perspective (wagon) and a consumer perspective (a project consuming one artifact of wagon)?
has really cyclonedx-maven-plugin a chance to magically detect different case without user deep configuration? How many additional files will have the plugin to download to do the advanced analysis?
notice: is this specific to the java world or do other ecosystems have such cases?
there are serious deep dives discussion to have to get the whole picture
in the same gav, there are both the initial .jar and one shaded one, like https://repo1.maven.org/maven2/org/apache/maven/wagon/wagon-http/3.5.3/
on this case, what should THE sbom contain to describe the 2 different jars?
Even though those are in the same gav, shouldn't we treat those jars as different artifacts and thus create different SBOMs for them?
there are serious deep dives discussion to have to get the whole picture
Indeed!
on this case, what should THE sbom contain to describe the 2 different jars? how would a project consuming one of these jars as a dependency know what to use?
I think that the SBOM should describe all the artifacts sharing the same GAV (at least the binary ones).
Some will be described as components, while other as assemblies. The classifier
and type
property of a pURL should be enough to make them apart.
A complex example, jakartaee-migration
has 3 assemblies:
shaded.jar
,bin.zip
,bin.tar.gz
.BTW: I think that if VEX-es become compulsory, developers will think twice before publishing this kind of assemblies. jakartaee-migration
contains commons-compress
and is vulnerable to all its CVEs.
Some Maven libraries publish shaded artifacts that contain many if not all their dependencies.
Since it is impossible to guess which artifacts were shaded from the POM file alone, the CycloneDX plugin should try to use the CycloneDX SBOMs of their dependencies, if available.
This feature request is related to #472 .