apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.46k stars 3.7k forks source link

Segment Loading on historical node startup #7374

Open michael-trelinski opened 5 years ago

michael-trelinski commented 5 years ago

Motivation

The first query for a given datasource after historical node startup tends to take a very long time. Subsequent queries are faster. This will add predictability to the system by making sure the first query is just as fast (if it can be) as the rest after historical node startup.

Proposed changes

Modify the CLIHistorical startup OR add an extension

The module is virtually the same either route, but it only gets loaded in the conf/historical/runtime.properties file if added as an extension, otherwise the CLIHistorical binding process binds this as a module as a LazySingleton. This module or extension will be disabled by default so as to not screw up any currently held beliefs about Druid's operations.

The module/extension will have an injected SegmentManager (which has a SegmentLoader).

The way I'm envisioning this is that there could be an interface that looks something like this:

interface SegmentCacheWarmupStrategy {
   public Collection<SegmentId> filterSegments(DataSource source, Collection<SegmentId> inputs);
}

Simply put: this is method is called by the Segment Warming module/extension when it is loaded (in a single background thread). The SegmentIds are grouped by DataSource for the filterSegments call. All classes that implement this receive Properties from the config file.

Various implementations could be added, for instance:

I propose that it is started with the first point about blindly loading segments until x% of memory is utilized / free. I also propose that we implement the filter start from now and going back in time.

All configurations for warming strategies shall support a default strategy if there isn't one in a configuration file. They may default to no strategy and no Segment Warming procedure shall take place.

Pursuant to these suggestions, I favor the following configuration properties:

Strategy: Naive fill x% or Naive fill up to x%:

Strategy: From Now until ...:

Strategy: Chained:

It should be possible to use the Segment Warmer to use multiple different strategies for all DataSources, but only one Strategy (or chained) per DataSource. An optional default strategy could be provided for unanticipated data sources, or left blank. The default could be that there might not be any cache warming unless explicitly enabled for unanticipated DataSources.

There will be no emitted metrics. Configuration parameters would be done in the runtime.properties file for the historical nodes only. There are no query specs/ingestion specs or SQL language concerns.

Rationale

A startup query could be run from some sort of crontab-like environment or on historical node startup, but it could get unruly since we really care about segments being loaded, not beating the cluster to death with queries where we pass data around and perform computation between nodes. Synchronizing this outside of Druid might be difficult and other customers may end up eventually writing this their own method. Let's grab the bull by the horns now so that implementations of this can be contributed back to the community and be closer to the Druid ecosystem than some other customers' business infrastructure.

Operational impact

Test plan (optional)

Future work

Thank you for your time

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

gianm commented 5 years ago

7936 adjusted config to not mark proposals as stale, so removing the label.

teyeheimans commented 3 years ago

We experience the same problems. Restarting an historical is problematic, as it is extremely slow after the restart. We would love to have this proposal to be implemented. We are running version 0.18.0-iap9.