apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.45k stars 3.7k forks source link

confusing configuration in RealtimeIndexTask #3186

Closed yaochitc closed 8 years ago

yaochitc commented 8 years ago

We found a problem while using RealtimeIndexTask. As we didn't found any document for configuration of this task, we checked the code, and found that this task uses the FireDepartment for configuration. It seems that the same class is also used in Realtime node, so we just copy the configuration from our realtime node. As we submitted the task, a exception was thrown:

SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container com.google.inject.ProvisionException: Guice provision errors:

1) Error in custom provider, com.metamx.common.ISE: Cannot add a handler after the Lifecycle has started, it doesn't work that way. at io.druid.guice.DruidProcessingModule.getProcessingExecutorService(DruidProcessingModule.java:93) at io.druid.guice.DruidProcessingModule.getProcessingExecutorService(DruidProcessingModule.java:93) while locating java.util.concurrent.ExecutorService annotated with @io.druid.guice.annotations.Processing() for parameter 0 at io.druid.query.IntervalChunkingQueryRunnerDecorator.(IntervalChunkingQueryRunnerDecorator.java:37) while locating io.druid.query.IntervalChunkingQueryRunnerDecorator for parameter 0 at io.druid.query.timeseries.TimeseriesQueryQueryToolChest.(TimeseriesQueryQueryToolChest.java:73) at io.druid.guice.QueryToolChestModule.configure(QueryToolChestModule.java:74) while locating io.druid.query.timeseries.TimeseriesQueryQueryToolChest for parameter 0 at io.druid.query.timeseries.TimeseriesQueryRunnerFactory.(TimeseriesQueryRunnerFactory.java:53) at io.druid.guice.QueryRunnerFactoryModule.configure(QueryRunnerFactoryModule.java:82) while locating io.druid.query.timeseries.TimeseriesQueryRunnerFactory while locating io.druid.query.QueryRunnerFactory annotated with @com.google.inject.multibindings.Element(setName=,uniqueId=20, type=MAPBINDER) at io.druid.guice.DruidBinders.queryRunnerFactoryBinder(DruidBinders.java:38) while locating java.util.Map<java.lang.Class<? extends io.druid.query.Query>, io.druid.query.QueryRunnerFactory> for parameter 0 at io.druid.query.DefaultQueryRunnerFactoryConglomerate.(DefaultQueryRunnerFactoryConglomerate.java:36) while locating io.druid.query.DefaultQueryRunnerFactoryConglomerate at io.druid.guice.StorageNodeModule.configure(StorageNodeModule.java:55) while locating io.druid.query.QueryRunnerFactoryConglomerate

Finally we found the reason. As the RealtimeIndexTask doesn't read the plumberschool information from the FireDepartment(it creates a new plumberschool in run method instead), so the "plumber" option under RealtimeConfig should not be used, otherwise the plumberschool will be instantiated when the json of a task is submitted. This would never succeed because the plumberschool depends on the the QueryRunnerFactoryConglomerate, which could not be created on nodes such as the overlord in our example.

I think the FireDepartment used in the RealtimeIndexTask is quite confusing. Maybe it is better to use another class for holding the configuration for tasks like this, and adding more detailed documentation for indexing-service will make it easier to use(we are glad to help with this:) ).

gianm commented 8 years ago

Hey @yaochitc, it is generally intended that most users would not use realtime index tasks directly, but would instead go through a higher-level API like Tranquility or the Kafka indexing service.

In particular there are some critical differences between realtime tasks and realtime nodes that mean that firehoses and plumbers designed to work with realtime nodes might not work properly with tasks.

I will close this since even though the API is not the most amazing API in the world, it actually isn't intended to be used directly, so maybe that's okay.

yaochitc commented 8 years ago

Got it!