zetcd uses the CVersion key's revision and version to compute the znode's Pzxid and CVersion respectively. When a child changes (e.g., creation, deletion), it touches the CVersion key to bump these values. Ephemeral key expiration uses etcd lease expiration, so it does not touch CVersion when it is deleted.
One possible solution involves extending etcd to associate a transaction with a lease (cf. https://github.com/coreos/etcd/issues/8842). Ideally, each ephemeral key would have a lease transaction that would touch its parent's CVersion key. This is probably expecting too much since it is too invasive on the etcd side; the txn logic would have to permit multiple updates to a key in the same revision and likely require deep mvcc changes. Alternatively, new "deleted ephemeral" keys could be created in the lease txn to mark tombstones for each expired key; the tombstones would then be used for reconciling the fields. Tombstones avoid multi-updates, but would need STM extensions for ranges (a feature request made a few times in the past, but only possible in 3.3+).
An approach with reconciliation but without lease txns: maintain a per-znode list of ephemeral children (elist), a per-ephemeral node key with a matching ephemeral owner (ekey), and a global revision offset key:
When creating an ephemeral key, add name to elist and create ekey if key does not exist. Wait on reconciliation if already in the elist.
When computing Stat, fetch the elist and compare with the child keys to detect expiry and wait for reconciliation.
A reconciliation goroutine watches for ekey deletion events. For each set of deleted ekeys under the same znode, set CVersion's count to the count-1, its zxid to the deletion event zxid and the current revision offset version, remove the keys from the elist, and touch the revision offset key. Notify waiters.
The revision offset is subtracted from the current zxid to compensate for the extra revisions from reconciliation txns.
Record the current revision offset in the mtime and ctime keys for computing mzxid and czxid. Compute via etcdrev-offset.
Record a count and the current revision offset in CVersion.
Compute CVersion by adding the stored count value to the key version.
Compute PZxid by using the stored CVersion zxid if no changes since last expiry
Will need some way to handle losing the reconciliation watch due to compaction.
Spun off of #88.
zetcd uses the
CVersion
key's revision and version to compute the znode'sPzxid
andCVersion
respectively. When a child changes (e.g., creation, deletion), it touches theCVersion
key to bump these values. Ephemeral key expiration uses etcd lease expiration, so it does not touch CVersion when it is deleted.One possible solution involves extending etcd to associate a transaction with a lease (cf. https://github.com/coreos/etcd/issues/8842). Ideally, each ephemeral key would have a lease transaction that would touch its parent's CVersion key. This is probably expecting too much since it is too invasive on the etcd side; the txn logic would have to permit multiple updates to a key in the same revision and likely require deep mvcc changes. Alternatively, new "deleted ephemeral" keys could be created in the lease txn to mark tombstones for each expired key; the tombstones would then be used for reconciling the fields. Tombstones avoid multi-updates, but would need STM extensions for ranges (a feature request made a few times in the past, but only possible in 3.3+).
An approach with reconciliation but without lease txns: maintain a per-znode list of ephemeral children (elist), a per-ephemeral node key with a matching ephemeral owner (ekey), and a global revision offset key:
CVersion
's count to the count-1, its zxid to the deletion event zxid and the current revision offset version, remove the keys from the elist, and touch the revision offset key. Notify waiters.mzxid
andczxid
. Compute via etcdrev-offset.CVersion
.CVersion
by adding the stored count value to the key version.PZxid
by using the storedCVersion
zxid if no changes since last expiry