airbnb / nerve

A service registration daemon that performs health checks; companion to airbnb/synapse
MIT License
942 stars 151 forks source link

add option to customize ZK node type #124

Closed austin-zhu closed 4 years ago

austin-zhu commented 4 years ago

Summary

Currently, nerve only registers to ZK as the hard-coded type of :ephemeral_sequential. Nerve could provide the flexibility to register as other different types, for example, :persistent type.

To keep back compatibility, the type is still :ephemeral_sequential if not specified.

NOTE: when using persistent-* type, it might cause stale data in ZK if the graceful shutdown doesn't invoke or failed executing.

Reviewers

@anson627 @panchr

Test

igor47 commented 4 years ago

Nerve could provide the flexibility to register as other different types,

curious about why you might want different types of ZK nodes.

panchr commented 4 years ago

@igor47

Nerve could provide the flexibility to register as other different types,

curious about why you might want different types of ZK nodes.

Using a persistent node can reduce the write storm caused by a high number of concurrent session expirations (during ZK degradation, for example). Currently, with ephemeral nodes, as soon as a session expires, the ephemeral nodes will be removed. Nerve "detects" this by ping failures, and then automatically restarts itself. This causes all those nodes to be re-created, which is a high number of concurrent write transactions and this can worsen problems on ZK.

Using persistent nodes can reduce that write storm because the nodes do not need to be recreated. Also, we are looking to use this for dual-writing persistent nodes to a secondary cluster

igor47 commented 4 years ago

with ephemeral nodes, as soon as a session expires, the ephemeral nodes will be removed.

curious, if nerve fails or the instance where nerve is running looses connectivity to the ZK cluster, what happens to the persistent node? it should be cleaned up... are you planning to have a separate cleaning task to remove stale nodes? are you changing the semantics of registration in ZK from "this node is currently online and able to handle traffic" to "this node existed at some point, but there are no claims to it's current status" ?

panchr commented 4 years ago

with ephemeral nodes, as soon as a session expires, the ephemeral nodes will be removed.

curious, if nerve fails or the instance where nerve is running looses connectivity to the ZK cluster, what happens to the persistent node? it should be cleaned up... are you planning to have a separate cleaning task to remove stale nodes? are you changing the semantics of registration in ZK from "this node is currently online and able to handle traffic" to "this node existed at some point, but there are no claims to it's current status" ?

Correct, the current idea is to have a cleaning task when using persistent nodes. The semantics are somewhat changing, with the key idea being that service discovery can tolerate a superset of healthy nodes for a limited amount of time. The data plane (HAProxy/Envoy) can filter out unhealthy nodes.

austin-zhu commented 4 years ago

with ephemeral nodes, as soon as a session expires, the ephemeral nodes will be removed.

curious, if nerve fails or the instance where nerve is running looses connectivity to the ZK cluster, what happens to the persistent node? it should be cleaned up... are you planning to have a separate cleaning task to remove stale nodes? are you changing the semantics of registration in ZK from "this node is currently online and able to handle traffic" to "this node existed at some point, but there are no claims to it's current status" ?

@igor47 it is a very good callout. TBH, we plan to verify the graceful shutdown behavior to make sure whether nerve will remove the node proactively when terminating properly. It seems it does but need to confirm again, check here.

I have to admit that one of the biggest advantages of ephemeral node is to avoid stale data, even if client doesn't clean it up properly. However, as @panchr mentioned earlier, in our case, it will be used for a secondary backup so that the side effect of stale data is acceptable, as we try to avoid primary and secondary ZK cluster failing in the same pattern. The client side proxy should be able to filter out stale data anyway if doing active health check.

Meanwhile, it seems that latest ZK server 3.5.6 supports TTL based node, which could be the eventual solution for this issue. However, the ruby ZK client doesn't support it yet and updating ZK server is risky too. Therefore, the stale data introduced by persistent node is somehow inevitable. Using another cron job to periodically clean it up could mitigate the issue but not ideal.

Overall, I think such a flexibility is still useful and the default behavior should still be ephemeral_sequential node type.