etcd-io / zetcd

Serve the Apache Zookeeper API but back it with an etcd cluster
Apache License 2.0
1.09k stars 94 forks source link

Ephemeral node deleted by connection loss prevents parent node deletion #88

Closed tsuraan closed 6 years ago

tsuraan commented 7 years ago

This one's a bit weird, but basically if you create some node, and then create a sequential node within it, the node goes away on client connection loss, but the parent cannot be deleted because it thinks it isn't empty. Here's a sample (using the python kazoo library):

Basic setup:

Python 3.4.5 (default, Jan 26 2017, 00:57:26) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from kazoo.client import KazooClient
In [2]: c=KazooClient()
In [3]: c.start()
In [4]: c.get_children('/')
Out[4]: ['']

Okay, now we can create foo, create foo/bar as ephemeral, fail to delete foo as expected, delete foo/bar, and then deleting /foo works also as expected:

In [5]: c.create('/foo')
Out[5]: '/foo'
In [6]: c.create('/foo/bar', ephemeral=True)
Out[6]: '/foo/bar'
In [7]: c.get_children('/foo')
Out[7]: ['bar']
In [8]: c.delete('/foo')
---------------------------------------------------------------------------
NotEmptyError                             Traceback (most recent call last)
...
NotEmptyError: 

In [9]: c.delete('/foo/bar')
Out[9]: True

In [10]: c.delete('/foo')
Out[10]: True

In [11]: 

Now for the weird; create /foo as before, create /foo/bar ephemeral as before, but disconnect and reconnect the client, then try to delete /foo (which appears to be empty):

In [11]: c.create('/foo')
Out[11]: '/foo'

In [12]: c.create('/foo/bar', ephemeral=True)
Out[12]: '/foo/bar'

In [13]: c.stop()

In [14]: c.close()

In [15]: c.start()

In [16]: c.get_children('/foo')
Out[16]: []

In [17]: c.delete('/foo')
---------------------------------------------------------------------------
NotEmptyError                             Traceback (most recent call last)
...
NotEmptyError: 

Well, that was weird. /foo looks empty, but deleting it says it isn't empty. Can we fix things? Yes, re-creating /foo/bar and then explicitly deleting it lets us delete /foo again:

In [18]: c.create('/foo/bar')
Out[18]: '/foo/bar'

In [19]: c.delete('/foo/bar')
Out[19]: True

In [20]: c.delete('/foo')
Out[20]: True

So, that's weird.

tsuraan commented 7 years ago

I've been digging a bit, but I think the issue may be that the correlated etcd node's lifetime is tied to the lease of the entire zetcd server, and not to any particular client's connection. I haven't yet found why the ephemeral node disappears from zetcd at all, but I would guess that there's a disconnect between what zetcd is showing and what etcd has stored. I'll keep poking around.

heyitsanthony commented 7 years ago

Nope, leases are per-connection. I think I found an unrelated ephemeral node bug, though...

tsuraan commented 7 years ago

Sorry for the silence, that fix does work for me.

andrewhao888 commented 6 years ago

Hello @heyitsanthony, I have met the same issue, and your fix commit also work for me I am just wondering when it will be merged into master? Thanks

heyitsanthony commented 6 years ago

/cc @gyuho

gyuho commented 6 years ago

Just merged. Thanks!