apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.12k stars 3.57k forks source link

[PIP 141] Pulsar BOM #14168

Closed cbornet closed 2 years ago

cbornet commented 2 years ago

Motivation

When designing NAR modules loaded by Nifi in the broker such as protocol handlers, proxy extensions, Pulsar IO connectors, etc..., it's important that the dependencies that are common to the module and the broker are as close as possible to prevent incompatible library exceptions (NoSuchClassError, NoSuchMethodError, IncompatibleClassChangeError, etc ...) at runtime. If a class is both in the NAR and in the broker, the broker one will be loaded.

Goal

This proposal is to define a BOM (Bill Of Materials) for Pulsar. A BOM is a special kind of POM that contains all the dependency versions that are used by the project and can be imported in another project. Currently there is a dependencyManagement section in Pulsar's parent POM but it's not always possible to derive from this parent POM as it imports a lot more things than the dependency versions and external projects usually prefer to have their own parent POM. External projects can import this BOM and use the same library versions as Pulsar at compile/test time.

API Changes

No API changes

Implementation

The dependencyManagement section of Pulsar's parent POM and related properties will be extracted in a POM and put in a pulsar-bom directory. The pulsar-bom artifact shall be built and released independently from the rest of Pulsar project (not a maven module). The Pulsar's parent POM dependencyManagement section is replaced by:

  <dependencyManagement>
    <dependencies>

      <dependency>
        <groupId>org.apache.pulsar</groupId>
        <artifactId>pulsar-bom</artifactId>
        <version>2.10.0-SNAPSHOT</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
    </dependencies>
  </dependencyManagement>

The CI will have to build pulsar-bom before building Pulsar.

Reject Alternatives

The BOM could be part of a distinct Git project. This would be harder to handle for contributions that modify both the BOM and Pulsar.

cbornet commented 2 years ago

It is possible to use the Pulsar parent POM as a BOM. See for instance how it's done in Pulsar adapters https://github.com/apache/pulsar-adapters/pull/35 . The Parent POM dependencies are imported with

    <dependency>
        <groupId>org.apache.pulsar</groupId>
        <artifactId>pulsar</artifactId>
        <version>${pulsar.version}</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>

It probably doesn't bring much value to separate it in a distinct POM. So closing for now. Don't hesitate to reopen if needed.

hpvd commented 2 years ago

strong vote for reopening, since a SBOM (Software Bill of Materials) in the standardized Software Package Data Exchange (SPDX) format (ISO standard for communicating SBOM information)

Other major projects provides it too e.g. kubernetes see https://sbom.k8s.io/v1.21.3/source

here is a great blog post, about how a BOM can be mapped to Vulnerabilities databases e.g. directly to Open Source Vulnerabilities (OSV) database which has the advantage, that it aggregates information across multiple ecosystems (e.g., Python, Golang, Rust) and databases (e.g., Github Advisory Database (GHSA), Global Security Database (GSD)).

https://security.googleblog.com/2022/06/sbom-in-action-finding-vulnerabilities.html

There is also a tool from kubernetes which automates the generation of SBOMs in SPDX format: https://github.com/kubernetes-sigs/bom

philwebb commented 1 year ago

+1 to reopening. Using the parent POM isn't really an option since it declares a number of dependencies that are not part of Apache Pulsar. A true BOM should only publish the modules that are part of the project and not any third-party dependencies.

jdimeo commented 10 months ago

I just had a nasty nasty stack overflow error in my IDE that was masking a Maven verison conflict error once I added client-original and local-runner to my POM. After much digging, I was able to resolve it using GRPC's BOM. Can we please have a Pulsar BOM that ensures the entire ecosystem is consistent?

We can't use a parent POM, we have our own. You need a BOM that you can import "sideways" to manage the transitive versions. Unless I'm misunderstanding and Pulsar's parent would only control version numbers and not bring in a bunch of other stuff?