elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.73k stars 24.42k forks source link

Remove requirement to manually manage all third-party dependencies #54749

Open mark-vieira opened 4 years ago

mark-vieira commented 4 years ago

We currently explicitly disable transitive dependency resolution for all third-party dependencies. This means for all third-party dependencies, we have to explicitly state their transitive dependencies, and the transitive dependencies of those, and so on. While this is perhaps not a huge task when initially setting up a project, it becomes a problematic maintenance cost because something as simple as upgrading a dependency requires a developer to manually reconstruct this graph again, and without good surrounding tooling.

There are two primary reasons why we have taken this approach:

  1. To ensure that we do not unknowingly bring in dependencies transitively that might be problematic.
  2. To keep our third-party dependency footprint as small as possible we want to make adding a new dependency require some amount of ceremony to force developers to consider whether this is really needed.

We already have some mechanisms in place to solve these problems. Our dependency checks at build time would already catch a scenario in which dependencies are added or changed as that requires updated license files and artifact checksums be added to the project. There are even additional things we could do such as enable dependency locking and provide a "diff" of the resolved graph when bumping dependencies.

The issue is right now that upgrading dependencies is tedious and undoubtably not producing correct results just due to how unreasonable it is to expect a developer to manually reconstruct these dependency graphs. Additionally, because when we "flatten" our dependency graphs this way, we effectively lose distinction between what our direct dependencies are, and which are transitive. Meaning that it's now no longer clear if we can remove unneeded dependencies that were previously only being pulled in transitively.

Simply put, we have at our disposal a complex and customizable graph resolution engine that we have instead unplugged and said "nah, we got this". The reality is... we don't. We should reevaluate our approach here to see if there are alternatives to achieving the goals listed above w/o going with the nuclear option of simply turning dependency management off.

elasticmachine commented 4 years ago

Pinging @elastic/es-core-infra (:Core/Infra/Build)

mark-vieira commented 4 years ago

cc @rjernst @jasontedor

mark-vieira commented 4 years ago

It's worth also noting here that while this issue is labeled :Core/Infra/Build, whatever solutions we come up here are going to require a team-wide effort to implement. Whenever we decide to "flip the switch" here (and we could do this incrementally), folks are going to have to go back and audit their module dependencies to decide what stuff belongs and what doesn't. This is best done by code owners (under supervision from build engineering) because a) this is something that is easily parallelized so we should do so to the extent we can and b) code owners are going to be in the best position to make decisions on what needs to stay, what needs to go, what is an API dependency, what are implementation dependencies, etc.

rjernst commented 4 years ago

Thanks for writing this up Mark. The summary of the current situation and reasons for doing so is good. I see some nice advantages to switching back (eg removing no longer relevant deps) to using transitive dependencies, but under the requirement that the protections we want are still in place in other ways.

I would like to add that we also do not want automatic conflict resolution. Instead we want to ensure any dependencies that could conflict (ie those that are shared with the core server) must be compiled against the same version, since plugins do not actually load those dependencies, they get them from the parent classloader which is core.

mark-vieira commented 4 years ago

I would like to add that we also do not want automatic conflict resolution.

Indeed. The nice thing is that transitive dependency management, and explicit conflict resolution are not mutually exclusive. We can (and should) have the dependency resolution engine sort out the dependency graph for us and still have it scream when it runs into conflicting modules requiring manual intervention.