dat-ecosystem-archive / DEPs

Dat Enhancement Proposals. Contains all specs for the Dat protocol, including drafts. [ DEPRECATED - see https://github.com/hypercore-protocol/hypercore-proposals for similar functionality. More info on active projects and modules at https://dat-ecosystem.org/ ]
https://dat-ecosystem.github.io/DEPs
166 stars 17 forks source link

Include heartbeat rate in the hypercore-protocol handshake #18

Open pfrazee opened 6 years ago

pfrazee commented 6 years ago

It turns out that peers need to agree on the heartbeat rate, otherwise they'll incorrectly detect a dead connection. For instance, if peer A expects a rate of 1 every 1s and peer B expects a rate of 1 every 5s, then peer A will believe the connection is dead because too many heartbeat "ticks" will pass with nothing from peer B.

This means the peers need to agree on heartbeat rates in order to maintain good connections. We have two options:

@mafintosh suggested option 2. In that case, we need to decide what rate to use if the peers suggest different values. Perhaps an average, or a lower limit, or an upper limit. Also some limit range should be hardcoded, ie 5s < x < 120s.

If we want to move forward with that, we'll need to update the handshake schema

bnewbold commented 6 years ago

Couldn't we make the heartbeak a SYN/ACK thing, where the remote is expected to reply to (acknowledge) any heartbeat? If that were the case, then the effective heartbeat period would be the minimum of the two peers' periods, without any other communication necessary.

Having a configurable (negotiated?) heartbeat rate is adding a piece of "state" to the connection (what is the current rate? can it be updated over time? are there min/max allowed values? option#2 would be immutable), and I think it's nice to avoid extra state when possible. On the other hand, controlling this parameter may end up being important for "supernodes" or "observers" that are connected to many peers for long periods of time. If connected to 2k peers, and the heartbeat is once every 5 seconds, that's 400 hearbeats a second, which is manageable but becoming non-trivial.

mafintosh commented 6 years ago

@bnewbold good idea. I'll cook something up

bnewbold commented 6 years ago

To carry over some conversation from a WG meeting:

Heartbeats and Keep-Alives are already implemented by some reliable transports (eg, TCP), so it could be considered redundant to do this as the application layer. On the other hand, doing it at the application layer makes configuration and behavior consistent across transports, and potentially supports reliable stream transports (or platforms) which do not expose timeout and keep-alive configuration to application code. Other protocols (eg, bittorrent) implement heartbeats/keepalive at the application layer.

Some concern about bandwidth consumption was expressed, as well as concern about battery and CPU consumption for multiple clients. IMHO it should be possible to bundle keep alive behavior in clients (eg, set timers to a coarse global granularity instead of per-connection timers), and to set relatively long (or exponentially growing?) periods to reduce impact. Additionally, "servers" (services and super-nodes) could be configured to skip or have extremely long (5+ minute) hearbeat periods to reduce traffic on "clients" (browsers or mobile devices).