apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.46k stars 3.7k forks source link

[QTL][Lookup] LookupExtractor configuration management. #2328

Closed b-slim closed 6 years ago

b-slim commented 8 years ago

In order to bring the query time lookup [QTL] to a production ready state, Druid need to have a central configuration and management layer. This management layer is suppose to provide: 1 Static (via property file) and Dynamic (via Coordinator at runtime) registration/unregistration of LookupExtractor implementations. 2 Periodic checkpointing of LookupExtractor instances, in order to be able to restart after failures or manual restart of the druid process. This layer of management can be split to XX pieces: -1 LookupRefManager that exposes listing/adding/deleting of LookupExtractor references PR 2291. -2 LookupHttpEndPointResource that exposes listing/adding/deleting via HTTP endpoint (this resource will depend on point 1 LookupRefManager). -3 LookupConfigurationLoader This piece will be responsible to load configuration for runtime property file, checkpoint periodically the current lookupExtracor object and reload both files after restart. -4 LookupCoordinator This piece will be running on the coordinators and will perform distributed configuration of lookupExtractor. All those pieces will be part of the druid core in order to guaranty homogeneous and clear way to manage LookupExtractor, then every one if free to actually implement the LookupExtractor interface.

drcrallen commented 8 years ago

That's why I proposed https://github.com/druid-io/druid/pull/2286 as the communication / http endpoints, https://github.com/druid-io/druid/pull/1576 to do the management and coordinator side. This approach was originally for dimension extraction lookups, but might be able to be adopted / modified for this case.

b-slim commented 8 years ago

@drcrallen unless i am missing something and don't see how #2286 or #1576 cover the first item which is the first basic brick. Then of course will come to the point on how to do the distributed config process and thought you had a couple of meeting with @guobingkun to hopefully agrees upon one way to do the thing