apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.47k stars 3.7k forks source link

Tidy up Injector Construction for Tests, Clients and More #12815

Closed paul-rogers closed 2 years ago

paul-rogers commented 2 years ago

Overview

Proposed is a set of refactoring steps to move Druid’s Guice usage from a set of ad-hoc routines into a set of builders, so that the mechanism can be used for a variety of use cases beyond the service-per-server approach which Druid uses today.

Motivation

Druid is designed to run as multiple services, each housing a single service. Druid uses Guice to manage dependencies. The code to populate Guice is “hard coded” to assume this service architecture.

However, it turns out there are multiple other uses of the Druid code base, none of which are well served by the present implementation.

Goal

The goal, then is to retain the functionality of Guice, but change the packaging to all us to create multiple configurations:

Design

The key challenge with the current code is that we hard-code the collection of modules, and we have private implementations of the mechanisms used to gather dependencies. The gist of this proposal is to refactor this code to be more open.

Injector Builder

A new InjectorBuilder provides an easy way to build up the set of dependencies for a given run. This is the most basic builder: it has no Druid assumptions and provides no default modules.

Startup Injector Builder

Druid uses a two-stage process to build dependencies. The first (or “startup”) stage builds an injector with basic dependencies: JSON, properties, and a few others. Then, the second (or “CLI”) stage defines the set of service-dependent modules, which themselves be injected with dependencies using the startup injector.

The startup injector includes the core Druid configuration mechanisms:

The current startup injector then unfortunately adds two items which are used only in a server context:

Since the above two items are not needed by clients or tests, they are encapsulated in a separate forServer() method called only by servers, leaving the "basic" injector ready for use in clients and tests.

Druid Injector Builder

Druid extends Guice in several ways:

The Druid injector builder handles these extra features. When used in tests and clients, there may be no module exclude list present, nor any node roles. When run on a server, then this filtering will be applied.

This class absorbs the ModuleList functionality currently in Initialization so that it can be used outside of the various CLI classes.

Server-specific Builders

Three builders combine to create the primary injector for a server:

These three builders provide overriding: later builders can override modules added in a previous builder. Combined, they replace the logic previously in Initialization.makeInjectorWithModules().

CoreInjectorBuilder can be used in tests. In this case, it provides only logging and the Druid lifecycle manager. Tests can add other modules as needed for that specific tests. (Tests that don't need either of these classes can use the startup injector builder.)

Server Injector Builder

It turns out that, once we tidy up the above, there is still a specific set of steps needed to assemble the injector for a server. This moves to a ServerInjectorBuilder.

Initialization

After the above refactoring (plus some minor items around extensions and Hadoop class path), the Initialization class is stripped down to a single method, and that one is now marked as deprecated. That single method allows tests to build a server-style injector. But, tests don't want to do that: they want to build an injector with just the test dependencies, but without the server config files or networking features. Cleaning that up is left as an exercise for a later PR.

Tests

Tests currently include rather complex code to either attempt to use Guice to create objects, or to work around Guice by hand-wiring components. A key challenge, as noted above, is that the existing injector-construction code assumes a server environment; tests must then somehow work around the fact that tests are not, in fact, servers.

This is particularly true in the "Calcite tests": there exists an elaborate set of ad-hoc code in CalciteTests to hand-wire a set of mock objects. The planner test PR struggled to refactor the code to allow more flexibility. A key motivation of this clean-up is to provide a framework that uses Guice to construct the Calcite test environment.

Clients

The integration tests (ITs) are essentially clients of the Druid server, but they use Guice to assemble various components needed for client functionality. The code to do this is quite complex and ad-hoc. The "new ITs" found it is also fragile, since the existing logic was designed for server, not client use.

A key goal of this proposal is to allow the ITs (and, in particular, the new version) to use a more solid approach to using Druid code in a client.

paul-rogers commented 2 years ago

Closing as this issue was fixed by PR #12816.