apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.53k stars 3.71k forks source link

Fix Huge Number of Watches in ZooKeeper #17482

Open GWphua opened 1 week ago

GWphua commented 1 week ago

Fixes #6647

Description

This PR is built upon #6683 and #9172 and aims to reduce the number of ZooKeeper watch counts.

Fixed Huge Number of Watches in ZooKeeper

Previously, Announcer.java causes all child nodes under the parent path to be watched by the ZooKeeper ensemble. This causes an unnecessarily large number of ZooKeeper watches to be produced.

The new NodeAnnouncer.java class, which is heavily modelled on the Announcer.java, aims to address this issue by announcing a single node within a ZooKeeper ensemble. By eliminating the watches on child nodes, this approach significantly reduces the total number of watch counts in ZooKeeper. Tests conducted on the production server also indicate a decrease in watch counts resulting from this change.

ZK Watch Count

The original Announcer.java still provides better performance for segment announcements, and hence will be retained in the codebase.

Documentation

Refactoring

Release note

Improved: ZooKeeper no longer spins up an unnecessary large number of watches when running realtime tasks.


Key changed/added classes in this PR

Further Actionable

I plan to create issues for the following follow-up actions after this PR:

Deprecated Code

The PathChildrenCache, NodeCache classes have been deprecated since Curator 5.0.0.

We can look into replacing these deprecated classes with CuratorCache. CuratorCache requires ZooKeeper 3.6+, and we are currently using ZooKeeper 3.8.4.

Concurrency Flow

Add a concurrent control flow documentation for NodeAnnouncer.

This PR has: