bug: peerclose fires without emitting peerconnect

ThaUnknown commented 2 years ago

If a peer fails to connect [not sure why that happens yet], it will emit close, without ever emitting peerconnect

This case needs to be considered

code:

const map = {}
p2pcf.on('peerconnect', peer => {
  map[peer.id] = true
})
p2pcf.on('peerclose', peer => {
  if (map[peer.id]) {
    // handle
  } else {
    // this should never occur but does
  }
})

should there be a separate event for failed to init peer?

ThaUnknown commented 2 years ago

these connection errors are a bit weird, because other people can connect to that peer just fine, but I cant

gfodor commented 2 years ago

ah yes, the peerclose event is being fired when a peer connection has been created but not necessarily established (it may have failed due to ICE.) i agree this behavior is wrong and should be fixed.

it can be hard to diagnose the problem without an understanding of WebRTC. the first step would be to get the ICE candidates attempted and report the browser types of the peers and the webrtc info (about:webrtc-internals in chrome, about:webrtc in firefox)

ThaUnknown commented 2 years ago

it's close to impossible to debug this in a ready application as you don't expose the peer as its starting to connect, we need a peercreate event or something like that

another issue is I ran into is that some people randomly disconnect with specific people with the error RTCError: User-Initiated Abort, reason=Close called

ThaUnknown commented 2 years ago

checking internals isn't really possible in electron

ThaUnknown commented 2 years ago

@gfodor I really am at my wits end no clue why this happens

ThaUnknown commented 2 years ago

holy SHIT, this was a pain, essentially what happens is p2pcf detects a network change and dumps all peers, which must have some sort of race condition when connecting to ICE/TURN servers because they constantly change for some people which use IPv6, so the question is, is this network change stuff actually necessary with webrtc? for me turning it off fixed a solid 2/3rds of connect issues and doesn't seem to cause any other issues? I'm not sure

gfodor commented 2 years ago

Ah thanks for investigating. The network change interval is there to help assist with recovery when you switch networks. Basically mostly for mobile devices going on and off of wifi. It’s necessary due to the signalling method I use. My guess is that whatever is going on can be accounted for, can you describe in more detail what change is occurring that is triggering it and why? It should be possible to set a breakpoint on the line in the interval that purges the peers to see what condition changed. If this method is somehow not tractable to detect changes without false positives I’ll have to figure out another way.

gfodor commented 2 years ago

If it’s coupled with ipv6 then we could ignore those in the reflexive ip set that it uses.

gfodor commented 2 years ago

Oh and we can probably not drop the peers that are connected via TURN. Is the peer whose IP changed on a symmetric NAT?

gfodor commented 2 years ago

I cut a branch that no longer forces any disconnects that will probably deal with this issue more gracefully. I have to test it to make sure it actually recovers still from a legit network change but for now you can probably use it to get past this issue.

https://github.com/gfodor/p2pcf/pull/12

Re: a peerstart event the tradeoff there is that due to the way this works there are scenarios where a lot of peers can get created that don't actually ever connect, and it's not an error. For example, if you join within a minute of someone leaving, you'll stlil see them in the poller and create a peer that silently dies and gets cleaned up. I decided against adding the event because I thought it might confuse people and they'll subscribe to the wrong event or think the fact that the peers end up not connecting is a bug.

That said, there is a use-case that I'd like to solve for where you want to know immediately when you connect if anyone else might be in the room. If the poller returns nothing, we know you are alone, but if it returns anything, we know you might not be alone. If this can fit into that problem that would be helpful in helping decide how to implement it well.

evan-brass commented 2 years ago

Are these two different scenarios? 1 where a connection won't succeed because of stale data and 2 where a network change causes an existing connection to disconnect? If so, the second case could be handled by listening for the disconnected iceconnectionstate and then triggering an ICE restart. Case 1 could probably be handled with a timeout that is cleared if the connection state ever reaches connected.

gfodor commented 2 years ago

So I think I managed to find a better solution to this. I now just clear the candidates from the state of the worker upon a successful connection, so there's no risk of old candidates being passed to peers upon a network change + disconnect. Some of the reason things are in the state they are is because I was fighting with Workers KV, which is eventually consistent, so I was being a lot more conservative with dynamically changing the state given the risk of delayed arrival.

Re: ICE restarts, I decide to punt on those. The peers recover when there's a failure, but they have to wait out a full failure, not just a disconnect.

Merged https://github.com/gfodor/p2pcf/pull/12 and shipped 1.3.0

gfodor commented 2 years ago

Dispatching and properly running an ICE restart would be a nice addition. It would require some modifications to tiny-simple-peer to handle the disconnect properly and a change to the p2pcf client to reset the state so it runs the full worker-based negotiation again.

ThaUnknown commented 2 years ago

can you describe in more detail what change is occurring that is triggering it and why?

console.error('network change', newUdpEnabled !== this.udpEnabled, newIsSymmetric !== this.isSymmetric, newDtlsFingerprint !== this.dtlsFingerprint, !retainedAnyReflexiveIps)

yielded true, false, false, true, but this is a sample size 1 of 10, so take it with a grain of salt the end user was doing nothing, just sitting there watching console, no interactions with the app or OS, no other [significant] apps in the background etc

Oh and we can probably not drop the peers that are connected via TURN. Is the peer whose IP changed on a symmetric NAT?

Their IP didn't change, and I'm not sure

The reason I wanted a peerstart or peercreate event is because there's no way for me to listen to ice state events from p2pcf without modifying it

gfodor commented 2 years ago

I think this issue should be fixed now so going to close. If another scenario comes up where it's desirable to have additional events regarding peer state please open another one - I will be probably updating start() to both await on the initial POST response and return a data structure describing the state of the room wrt existing peers.

gfodor / p2pcf

bug: peerclose fires without emitting peerconnect #8