chr15m / bugout

Back end web app services over WebRTC.
https://chr15m.github.io/bugout
MIT License
603 stars 59 forks source link

Partially resolves #31 - Corrects heartbeat logic to honor passed in interval #55

Closed draeder closed 2 years ago

draeder commented 2 years ago

Emit 'left' event on detach. This should resolve the issues I've seen with the 'left' event not firing when a peer disconnects.

hello-smile6 commented 2 years ago

Emit 'left' event on detach. This should resolve the issues I've seen with the 'left' event not firing when a peer disconnects.

Thank you for fixing this! @chr15m Could you create another release of dirc with this change? I've noticed this problem on dirc a lot.

chr15m commented 2 years ago

@draeder one issue here is that a peer may not have actually left the group, the network might just have been partitioned. For example it's possible to have a situation where peers are connected like this: A -> B -> C and A and C can't see each other directly but messages are still passed (because of the gossip protocol used) through B.

Maybe I'm overthinking it though and paying too much attention to this edge case. Happy to merge if you think this will help.

chr15m commented 2 years ago

@hello-smile6 if you submit a PR to dirc I will merge and build. Thanks!

draeder commented 2 years ago

@chr15m I guess I haven't encountered that edge case in any of my testing, but I haven't tested with much more than about 6 peers at a time. That said, I'll continue reviewing and see if there is a better solution that will address gossip peers. I'll keep the PR open for now though.

draeder commented 2 years ago

I was able to identify the issue with Bugout.heartbeat(). Instead of using bugout.timeout for evaluation, interval should be used. This partially resolves #31, and using Bugout.heartbeat() provides a workaround.

The latest PR includes the fix and build.

draeder commented 2 years ago

Merged! Let me know if there are any issues. This should start working immediately in browsers, but still needs to be published to NPM.

chr15m commented 2 years ago

I think this is going to cause false disconnects. Consider the situation where the network is quiet and the heartbeat is the only thing going on. The hearybeat message is expected every interval milliseconds. It is sent by the other party every interval milliseconds. If there is even a slight delay on the message due to lag or whatever, the time will exceed interval and the client will be flagged as disconnected. I think that's the reason for the original use of timeout instead.

A possible solution is to wait for 2x interval, or shorten the timeout value and change it back.

Really there should be some tests for this so we can verify correct disconnection of clients that are not directly connected via webrtc.

draeder commented 2 years ago

Okay. My apologies if I jumped the gun with the merge. What I found is that last + bugout.timeout < t was evaluating as false nearly indefinitely because the time that elapsed between last and interval was being added to the evaluation.

Based on my experimentation he calculation with timeout always evaluates as false, because it is never less than t. In fact, upon disconnect the calculation increases by interval

Here's an example with interval = 10000 msec:

I added the following before the evaluation:

      console.log(new Date(t), 'bugout.timeout', last+bugout.timeout < t, 'diff', t-last+bugout.timeout)
      console.log(new Date(t), 'interval', last + interval < t, 'diff', t-last+interval)
2022-01-09T23:45:28.235Z bugout.timeout false diff 308128
2022-01-09T23:45:28.235Z interval false diff 18128
2022-01-09T23:45:38.226Z bugout.timeout false diff 308124
2022-01-09T23:45:38.226Z interval false diff 18124
// Peer B disconnects here
2022-01-09T23:45:48.225Z bugout.timeout false diff 318123
2022-01-09T23:45:48.225Z interval true diff 28123
2022-01-09T23:45:58.223Z bugout.timeout false diff 328121
2022-01-09T23:45:58.223Z interval true diff 38121
chr15m commented 2 years ago

Ah ok, it looks like that works then. Looks like there is a buffer of interval allowed before disconnect already. :+1:

It looks like there is a test for this already and it's still passing: https://github.com/chr15m/bugout/blob/master/test.js#L202