What if a Maven module produces multiple variants?

raboof commented 1 week ago

How should we deal with the scenario where a maven module produces several types of output, for example both a 'regular' jar (with regular dependencies) and a 'fat' jar (with some of the dependencies embedded into the jar)?

This question came up in https://github.com/CycloneDX/cyclonedx-maven-plugin/issues/472#issuecomment-2439884740. This situation can be found for example in https://repo1.maven.org/maven2/org/apache/pekko/pekko-grpc-codegen_3/1.1.0/ : it has both the -assembly.jar and a regular .jar.

I'd say the main choice is between describing both the 'regular' and the 'fat' artifact in one SBOM, or to creating separate SBOMs for each artifact.

From the perspective of "how are SBOMs used?", I think there are typically two classes of consumers of SBOMs for a project: the maintainers of the project themselves, and users who have the project somewhere in the dependency tree of their systems. The latter category will probably start from a particular artifact that they encounter on their systems, and be interested in specifically the SBOM for that artifact. So I think they would prefer it if there were a separate SBOM for each artifact. For the maintainers of the project I could see use cases where they'd like to analyze the combined metadata for all their artifacts, but for that I don't see a particularly strong preference for 'one big SBOM' over 'SBOM per artifact'.

So overall I think it'd make most sense to have a separate SBOM per artifact - i.e. for https://repo1.maven.org/maven2/org/apache/pekko/pekko-grpc-codegen_3/1.1.0/ generate both a pekko-grpc-codegen_3-1.1.0-cyclonedx.json and a pekko-grpc-codegen_3-1.1.0-assembly-cyclonedx.json.

hboutemy commented 1 week ago

yes, this is a variant of shading #472

ppkarwasz commented 1 week ago

So overall I think it'd make most sense to have a separate SBOM per artifact

I am wondering, how these SBOMs for alternative artifact version can be used in practice. The artifact with the assembly classifier should not be used in a Maven project, because:

it includes all its dependencies, but
it shares the POM file with the non-shaded version of the artifact. Maven will still download and add those dependencies to the Maven project.

hboutemy commented 1 week ago

it's a reality of the output: makes sense to describe this reality question: how is it created? When I look at Pekko source code https://github.com/apache/pekko,it's not obvious, as it's Scala + sbt, not my knowledge another question: what is the expected content: can you draft how this SBOM would be different from the other one?

and final question: all that is from a provider point of view, describing in more detail the ouput of his release but how is it used by consumers? ("it" being the SBOM but also the artifact as Piotr said)

raboof commented 1 week ago

I am wondering, how these SBOMs for alternative artifact version can be used in practice. The artifact with the assembly classifier should not be used in a Maven project, because (...)

Indeed this particular artifact should not be used as a dependency in a Maven project, but as a protoc plugin in for example protobuf-gradle-plugin or protobuf-maven-plugin. It looks like spring-boot allows something similar (https://docs.spring.io/spring-boot/maven-plugin/packaging.html#packaging.repackage-goal.parameter-details.classifier).

what is the expected content: can you draft how this SBOM would be different from the other one?

This is what we're discussing in #472: the SBOM for the 'regular' artifact describes its 'regular' dependencies, while the SBOM for the 'assembly' artifact should somehow encode the fact that it 'embeds' those dependencies rather than just referring to them.

all that is from a provider point of view, describing in more detail the output of his release but how is it used by consumers? ("it" being the SBOM but also the artifact as Piotr said)

One use case for SBOMs is more accurate 'security scanning', where some security scanner tool uses the SBOM information to put together information on what components are on a system, and match that to open security advisories. Suppose the codegen project depends on a 'vulnerable' version of some library. If the 'regular' codegen artifact is part of the users' system, the scanner should not necessarily flag this 'vulnerable' dependency, as it may have been overridden by whatever project depended on the 'regular' codegen artifact. However, when the 'assembly' codegen artifact is part of the users' system, the scanner should use the information (directly or indirectly) from the SBOM of that artifact to learn that the vulnerable code is actually inside the jar, regardless of further context.

CycloneDX / cyclonedx-maven-plugin

What if a Maven module produces multiple variants? #574