Closed diranged closed 10 years ago
Opened issue with the Kazoo team as well: https://github.com/python-zk/kazoo/issues/229
I've found the basic fix for this, and am working on tests for it now. Ultimately it looks like a behavior change in KazooClient.add_auth() has caused the deadlock.
It seems that Kazoo 2.0 was released without an announcement to the list ... and we inadvertently had a server boot up and install it instead of our tried and true 1.3.1. We discovered a problem with our nd_service_registry code and Kazoo 2.0 that I could use help with.
The problem manifests as a dead-lock when we instantiate a nd_service_registry.KazooServiceRegistry() object with a username/password setting, and then call the set_node() method.
The issue seems to be line 855 (https://github.com/Nextdoor/ndserviceregistry/blob/master/nd_service_registry/__init__.py#L827-L856) where we use the KazooClient.handler.lock_object() method to get a run lock, then call the KazooClient.retry() method on the KazooClient.add_auth() method. If we remove either the 'with self._run_lock' line (827), OR we remove the self._zk_retry() line (855), the problem goes away and our code works just fine.