apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.42k stars 1.27k forks source link

Proposal to Harmonize Pinot Dependency Management #12676

Open gviedma opened 6 months ago

gviedma commented 6 months ago

Problem Statement

Pinot’s approach to dependency management can be ad-hoc and inconsistent, with dependency versions being resolved in multiple different ways:

  1. Explicit versions defined in dependencyManagement section in the root pom a. Sometimes these are defined using a property, sometimes the version is hard-coded
  2. Submodules may define their own version of a dependency a. Again these may be hard-coded or defined in a property
  3. No explicit version is set, in which case Maven resolves the version by analyzing transitive dependencies

The lack of consistency results in the following problems:

  1. Unintended or unpredictable dependency version changes. Bumping one library can have an unforeseen ripple effect on the version of other transitive dependencies.
  2. Wasted effort, as contributors are each trying to solve dependency management in their own way.
  3. Lack of centralized dependency management means it is not possible to override a particular dependency version without being subject to regressions caused by subsequent OSS changes. As a result, platforms that need to build Pinot with well-known dependency versions for compliance reasons have to frequently fix their builds to stay aligned with OSS dependency management changes.

Proposal

Ideally, Pinot should leverage Maven’s best practices for Dependency Management. At a high level, this consists of standardizing along the following areas:

  1. All explicit dependency versions should be defined in the parent pom via Maven’s standard dependencyManagement mechanism, and use an overridable Maven property
  2. Where possible, define dependencies using the Bill of Materials (BOM) such that all related artifacts are guaranteed to be at the same version.
  3. Do not set any explicit dependency versions at a submodule level. Instead, these should always be set as described in 1).

We should also consider enforcing the above recommendation via a linting mechanism to ensure consistency.

Example of How to Specify a Dependency Version

For example, do not do this in a submodule:

    <dependency>
      <groupId>io.grpc</groupId>
      <artifactId>grpc-context</artifactId>
      <version>1.60.1</version>
    </dependency>

Instead, follow this pattern in the root pom:

<!-- pinot/pom.xml -->
  <properties>
    <grpc.version>1.60.1</grpc.version>
  </properties>

  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>io.grpc</groupId>
        <artifactId>grpc-bom</artifactId>
        <version>${grpc.version}</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
siddharthteotia commented 6 months ago

@Jackie-Jiang / @xiangfu0 / @snleee

Lately we have seen several dependency related issues that fail the build of our internal applications when the build job attempts to link them with OSS jars.

We have mostly ended up making fixes to exclude and / or pin versions etc in the build pipeline that brings in the OSS jars into our artifactory.

Through this issue and the proposal, we wanted to discuss and see if we can try to reduce such problems in the future

Not sure if others have seen any recent similar issues cc @ankitsultana

Jackie-Jiang commented 6 months ago

Agree on this approach. @gviedma Can you share a link for the Maven's best practice for Dependency Management? It will help everyone onboard to this standard. We are kind of following this approach recently when addressing dependency vulnerabilities, and strong +1 on enforcing some rules for future dependency management

gviedma commented 6 months ago

@Jackie-Jiang You can find the best practices under the Maven dependency management guide. The following sections are of particular relevance:

Perhaps this is something that can be enforced during the PR review process or even automated via a linting mechanism.

ankitsultana commented 6 months ago

This looks good to me. It would be great if we can put these guidelines in the Apache Pinot Wiki as well.

@gviedma : are you also planning to restructure/fix all of the existing dependencies? I am wondering if there's a way to ensure that after the restructuring there won't be any breaking changes. (e.g. if we switch to a bom dependency for grpc, it could potentially change some existing dependency which can cause a regression)

xiangfu0 commented 6 months ago

Here is my +100 to this approach. Curious on the best practice for managing the plugins/connectors to handle multiple versions of same library, e.g. Spark2 and Spark3 support, etc.?

gortiz commented 6 months ago

I'm fine with this approach. Maybe I would suggest to:

  1. Do not use properties if we expect to only use it once. Otherwise we need to add one extra line per dependency family and it is less direct to know the dependency version (once the dependency declaration is found we need to go to the property).
  2. Create a subproject that can be used as the Pinot BOM, where all the dependencies included in Pinot are defined. Simplify the dependencyManagement section to import that bom. This should be effectively the same as we have right now, but:
    1. The parent pom should be significantly sorter
    2. Plugins could import that BOM instead the whole Pinot parent pom (although the could still inherit).
  3. On plugins:
    1. Import the bom or declare pinot pom as the parent.
    2. For each library used by the plugin:
    3. If the library is included in the BOM and the plugin use that version:
      • Do not shade that dependency
      • Consume it with scope dependency
    4. If the library is not included in the bom or (is included in the BOM and the plugin use other version):
      • That dependency must be shaded
      • Objects of that type must not be sent to Pinot or other plugins

3.2.1 should reduce the total package size, the build time and possible errors related to shading. 3.2.2 should reduce errors related to shading.

timveil commented 6 months ago

i'm very interested in helping here. this has been something i've been working on in various forms on the side. i just submitted a PR to move versions to properties in the main pom as a first step in getting better organized there. see #12736

siddharthteotia commented 6 months ago

@timveil - thanks for expressing interest. Just to confirm - are you open to working on action items coming out of this ?

@gviedma - Looks like there is overall consensus. I think it will be great if we can may be create concrete issues or at least a bullet list of things to work on to get to the desired state.

It will help @timveil or others to pick up the work. I can also help distribute / find people.

It would be great if we can put these guidelines in the Apache Pinot Wiki as well.

@ankitsultana - Sure we can do that sometime soon.

I am wondering if there's a way to ensure that after the restructuring there won't be any breaking changes. (e.g. if we switch to a bom dependency for grpc, it could potentially change some existing dependency which can cause a regression)

@ankitsultana - I would suggest doing this incrementally and hopefully the result is not worse than what already is until we get to the final state.

timveil commented 6 months ago

@timveil - thanks for expressing interest. Just to confirm - are you open to working on action items coming out of this ?

yes. i have a strong interest in improving dependency management for pinot

gviedma commented 6 months ago

Here's what I would like to suggest for the next steps:

  1. Create a document to capture the new guidelines for dependency management and link to it from Pinot's Contribution Guidelines.
  2. Consolidate the existing parent pom dependencies in a Pinot BOM as suggested by @gortiz a. Update the parent pom to import the BOM.
  3. Update plugins dependencies as needed depending on whether they use the same version as the BOM or a different one.

I can volunteer to take on item #1 to formalize the current proposal and incorporate @gortiz's suggestions. The remaining work items 2 and 3 can be distributed among various folks. I can also help with 2 and 3 as needed and happy to review as well cc @siddharthteotia

siddharthteotia commented 6 months ago

Aligned on the next steps.

QQ - At the end, don't we need to change anything in the PR review steps / github actions to ensure the guidelines are followed ?

siddharthteotia commented 5 months ago

@gviedma - Let's try to publish (1) soon please. I see multiple dependency upgrade PRs opened and would like them to follow the process.

gviedma commented 5 months ago

I have published a set of guidelines along with examples in the following doc Please review and leave comments when you get a chance. Once we have enough consensus we can reference this doc from the Contributing Guidelines.

FYI @gortiz I attempted to incorporate your BOM subproject idea and import it from the Pinot POM, but that is explicitly disallowed by Maven's dependency management:

Do not attempt to import a POM that is defined in a submodule of the current POM. Attempting to do that will result in the build failing since it won't be able to locate the POM.

I will follow up with you separately on this and to ensure I captured your plugin dependency suggestions correctly in my doc. Thanks!

cc @siddharthteotia

gortiz commented 5 months ago

FYI @gortiz I attempted to incorporate your BOM subproject idea and import it from the Pinot POM, but that is explicitly disallowed by Maven's dependency management:

I see. I didn't know that limitation, but it makes sense. We can find a way to create that POM. That will be specially useful for people that want to create their own plugins outside the Pinot repo, but I guess that pattern is not very common and they can still just declare the Pinot POM as parent, so I would vote +1 to the solution proposed by @gviedma.

BTW I've added some comments on the linked document, but I think it is also fine and it would be great to reference that doc from the contribution guidelines.

gviedma commented 5 months ago

I've created a PR to incorporate the dependency management guidelines into the pinot-docs repository. Once that is merged, I will cut a second PR against the main pinot repo to update https://github.com/apache/pinot/blob/master/CONTRIBUTING.md and point at the new dependency management guidelines cc @siddharthteotia