Open lhotari opened 3 years ago
There's also #14106 about RackAwareTest.testPlacement
I have encounter the same problem when do cluster expansion in our production environment.
We firstly change rack information in zk /bookies, and then add broker and bookie to our cluster. Then the same Timeout Exception occur. I guess it has successfully get data from /bookies, but block due to dead lock in executors.
Describe the bug
Flaky test RackawareTest.testPlacement has been moved to quarantine test group so that it doesn't block CI, this change was made in #11370.
The root cause seems to be a production code issue. The stacktrace shown in the logs of a failing test hints that there might be a dead lock that happens. One common issue is locking up Zookeeper threads when Zookeeper operations are initiated from a zookeeper thread that notifies about a change (example of such issue is #10418).