apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.52k stars 3.71k forks source link

KubernetesTaskRunner: Wait in start() for tasks to be located. #17419

Closed gianm closed 3 weeks ago

gianm commented 4 weeks ago

This helps with orderly Overlord failover. After this patch, by the time the task runner returns from start(), all tasks are located (subject to some timeout). This is useful for supervisors, which start next and which need to contact tasks.

The timeout defaults to 1 minute, and is configurable using druid.indexer.runner.taskJoinTimeout.