Kafka Indexing Service in Druid

To understand the source code of the Apache Druid Kafka Indexing Service, it is helpful to have a general understanding of how Druid works, as well as some experience with Kafka and Java programming.

Start by looking at the main classes that make up the Kafka Indexing Service, such as KafkaIndexTask.java and KafkaIndexTaskClient.java. These classes handle the coordination and execution of the indexing process.
Review the KafkaSupervisor class, it is responsible for managing the lifecycle of the Kafka index tasks, and it is responsible for creating new tasks and assigning them to workers.
Look at the KafkaConsumer.java class, this class is responsible for consuming data from Kafka, it is also responsible for deserializing the data and passing it to the index tasks.
Understand the KafkaIndexTaskIOConfig.java class, it holds the configuration for the Kafka indexing service, such as the Kafka bootstrap servers, topic, and consumer properties.
Look at the KafkaTuningConfig.java class, this class holds the configuration for tuning the Kafka indexing service such as the number of replicas, task count, and task duration.
Look at the KafkaIndexTask.java class, this class is responsible for reading the data from the Kafka and writing it to the Druid data store.
Look at the KafkaIndexTaskClient.java class, this class is responsible for communicating with the index tasks, such as starting and stopping them.
Look at the KafkaDataSourceMetadata.java class, this class holds metadata for the Kafka data source.

Finally, you should also familiarize yourself with the Kafka Java API, which is used extensively in the Kafka Indexing Service.

It's also worth noting that Druid is a complex piece of software, so it may take some time to fully understand the Kafka Indexing Service. In case of any doubts, reading through the documentation and discussing with the Druid community may help.

box-lin / miniblog

Kafka Indexing Service in Druid #13