etcd-io / zetcd

Serve the Apache Zookeeper API but back it with an etcd cluster
Apache License 2.0
1.09k stars 92 forks source link

correctly account for ephemeral node expiration in parent znode stats #90

Open heyitsanthony opened 6 years ago

heyitsanthony commented 6 years ago

Spun off of #88.

zetcd uses the CVersion key's revision and version to compute the znode's Pzxid and CVersion respectively. When a child changes (e.g., creation, deletion), it touches the CVersion key to bump these values. Ephemeral key expiration uses etcd lease expiration, so it does not touch CVersion when it is deleted.

One possible solution involves extending etcd to associate a transaction with a lease (cf. https://github.com/coreos/etcd/issues/8842). Ideally, each ephemeral key would have a lease transaction that would touch its parent's CVersion key. This is probably expecting too much since it is too invasive on the etcd side; the txn logic would have to permit multiple updates to a key in the same revision and likely require deep mvcc changes. Alternatively, new "deleted ephemeral" keys could be created in the lease txn to mark tombstones for each expired key; the tombstones would then be used for reconciling the fields. Tombstones avoid multi-updates, but would need STM extensions for ranges (a feature request made a few times in the past, but only possible in 3.3+).

An approach with reconciliation but without lease txns: maintain a per-znode list of ephemeral children (elist), a per-ephemeral node key with a matching ephemeral owner (ekey), and a global revision offset key: