Tidy up Injector Construction for Tests, Clients and More

Overview

Proposed is a set of refactoring steps to move Druid’s Guice usage from a set of ad-hoc routines into a set of builders, so that the mechanism can be used for a variety of use cases beyond the service-per-server approach which Druid uses today.

Motivation

Druid is designed to run as multiple services, each housing a single service. Druid uses Guice to manage dependencies. The code to populate Guice is “hard coded” to assume this service architecture.

However, it turns out there are multiple other uses of the Druid code base, none of which are well served by the present implementation.

Tests either cobble together an injector or, more frequently, build up a set of objects using ad-hoc mechanisms. This has resulted in ever-more-complex bits of setup code which mimic what Guice does. This ad-hoc is hard to understand, fragile, and acts as a barrier to rapidly creating more tests.
Clients, such as the integration tests (ITs) use Druid code, but do not run services. They twist themselves into a pretzel trying to use the current server-focused code, without actually running a service.
Single-process is the idea that, like many other Apache projects, we want to run Druid’s many services within a single Java process, at least for unit testing. (Other projects call this a “mini-server” or some such name.) In this case, we will configure multiple services using the same injector, but the current code is not designed for this use case.

Goal

The goal, then is to retain the functionality of Guice, but change the packaging to all us to create multiple configurations:

Druid service-per-server
Unit tests (without running a service)
ITs (clients)
Unit tests (running multiple Druid services)
Others as we find the need

Design

The key challenge with the current code is that we hard-code the collection of modules, and we have private implementations of the mechanisms used to gather dependencies. The gist of this proposal is to refactor this code to be more open.

Injector Builder

A new InjectorBuilder provides an easy way to build up the set of dependencies for a given run. This is the most basic builder: it has no Druid assumptions and provides no default modules.

Startup Injector Builder

Druid uses a two-stage process to build dependencies. The first (or “startup”) stage builds an injector with basic dependencies: JSON, properties, and a few others. Then, the second (or “CLI”) stage defines the set of service-dependent modules, which themselves be injected with dependencies using the startup injector.

The startup injector includes the core Druid configuration mechanisms:

The Jackson ObjectMapper for both JSON and Smile.
The JSON configuration system which builds config objects from properties
Null handling and expression handling.
A bridge to the primary (CLI) injector.

The current startup injector then unfortunately adds two items which are used only in a server context:

Configuration files from the class path
Runtime information about memory, processors, etc.

Since the above two items are not needed by clients or tests, they are encapsulated in a separate forServer() method called only by servers, leaving the "basic" injector ready for use in clients and tests.

Druid Injector Builder

Druid extends Guice in several ways:

The DruidModule class which adds Jackson modules to the ObjectMapper
Filtering of modules based on an exclude list from properties, and LoadScope annotations.
Inject dependencies from the startup injector into modules used for the primary injector.

The Druid injector builder handles these extra features. When used in tests and clients, there may be no module exclude list present, nor any node roles. When run on a server, then this filtering will be applied.

This class absorbs the ModuleList functionality currently in Initialization so that it can be used outside of the various CLI classes.

Server-specific Builders

Three builders combine to create the primary injector for a server:

CoreInjectorBuilder holds the list of modules previously listed in the Initialization class.
ServiceInjectorBuilder holds the list of service-specific modules.
ExtensionInjectorBuilder holds the list of extension modules obtained from extensions on the class path.

These three builders provide overriding: later builders can override modules added in a previous builder. Combined, they replace the logic previously in Initialization.makeInjectorWithModules().

CoreInjectorBuilder can be used in tests. In this case, it provides only logging and the Druid lifecycle manager. Tests can add other modules as needed for that specific tests. (Tests that don't need either of these classes can use the startup injector builder.)

Server Injector Builder

It turns out that, once we tidy up the above, there is still a specific set of steps needed to assemble the injector for a server. This moves to a ServerInjectorBuilder.

`Initialization`

After the above refactoring (plus some minor items around extensions and Hadoop class path), the Initialization class is stripped down to a single method, and that one is now marked as deprecated. That single method allows tests to build a server-style injector. But, tests don't want to do that: they want to build an injector with just the test dependencies, but without the server config files or networking features. Cleaning that up is left as an exercise for a later PR.

Tests

Tests currently include rather complex code to either attempt to use Guice to create objects, or to work around Guice by hand-wiring components. A key challenge, as noted above, is that the existing injector-construction code assumes a server environment; tests must then somehow work around the fact that tests are not, in fact, servers.

This is particularly true in the "Calcite tests": there exists an elaborate set of ad-hoc code in CalciteTests to hand-wire a set of mock objects. The planner test PR struggled to refactor the code to allow more flexibility. A key motivation of this clean-up is to provide a framework that uses Guice to construct the Calcite test environment.

Clients

The integration tests (ITs) are essentially clients of the Druid server, but they use Guice to assemble various components needed for client functionality. The code to do this is quite complex and ad-hoc. The "new ITs" found it is also fragile, since the existing logic was designed for server, not client use.

A key goal of this proposal is to allow the ITs (and, in particular, the new version) to use a more solid approach to using Druid code in a client.

apache / druid