Flickering peers in v8 - Githubissues

gmaclennan commented 4 years ago

I've been diving into the issue of trying to understand why peers are flickering on and off (in @mapeo/core@8.1.3) and I think I understand what is happening.

syncState._state is a map of peers by peer.id. If there is an error with the connection to a peer (e.g. by wifi being turned off or loosing signal) then peer.state is updated to indicate the error, and it remains on syncState._state (I would expect this).

When the peer comes back it appears with a new peer.id and is added to the syncState._state map. It has the same peer.name as the "errored" peer, but a different id, so both remain in _state.

The front-end uses calls to syncState.peers() to get a list of peers, and the peer will appear duplicated (the errored peer, and the new peer with the good connection).

When the new peer completes sync, it is removed from _state and added to syncState._completed.

Now when syncState.peers() is called, it first maps over the peers in _state and gets the errored peer, then it maps over _completed but because a peer with the same name is already there (the errored peer) the completed peer does not appear in the list of peers returned, so the UI now shows it as errored.

I think then what happens is because the sync stream is complete, the connection closes, and the peer is re-added to _state, so it now appears again in syncState.peers().

From the user perspective, this state can happen if:

First sync - connection error.
Reconnect, sync is successful
Now sync is almost instant (because there is nothing to sync), but pressing sync temporarily puts the peer in _completed, so it disappears, then sync stream closes, it reconnects, and gets added to _state again, so it re-appears to the user, but without the completed state, so to the user it seems like nothing has happened.

hackergrrl commented 4 years ago

Great analysis @gmaclennan, this sounds accurate to me. Using an ID unique to the device (like its writeable hypercore public key) should fix this: the reconnecting peer can then overwrite the completed or errored ones.

hackergrrl commented 4 years ago

Realizing that if we do what I suggest, the reconnecting peer (upon sync finish OR error) will immediately overwrite the old peer state and appear as "ready to sync". This will prevent the old sync from showing up as completed. What is the desired behaviour?

gmaclennan commented 4 years ago

Can we add other fields? Sounds like we need deviceId, connectionId/peerId and deviceName.

gmaclennan commented 4 years ago

The challenge with the "completed state" is that it's not really a state, it's an event. It means that at that particular moment in time the peer has all the same data as the peer that it synced with, but at any moment after that (e.g. after a sync with another peer) this is no longer the case. I think we were moving towards solving this with the new sync progress work, but we need to figure out how to patch something in right now so it's not buggy. On the front end right now I think that rather than using the completed state it uses the lastCompleted date, and if that is more recent than the time the sync screen was opened, then it shows it as completed. e.g. navigate away from the screen and come back and it is no longer "completed".

gmaclennan commented 4 years ago

I did some thinking about the UX needs for the sync screen:

Show 1 "peer" for each device.
If there is an error syncing a peer, show peer with error until the user clicks "retry".
If a peer completes sync successfully, show peer as "Completed" until either the user clicks "sync" again, or navigates away from the sync screen.

For now, I think it's reasonable to treat device names as unique (e.g. can be used to de-dupe if needed). We should add a random string that is device-stable to the end of the device name on Desktop (currently it uses the hostname). In the future we will support custom device names so we should not rely on them being unique in the future (users could choose identical device names by mistake).

The front end code could use a "lastCompleted" field to implement (3) e.g. "show peer as complete if lastCompleted > timeUserNavigatedToScreen.

hackergrrl commented 4 years ago

I think we can just use the hypercore public key for now as your peer id. If we need to change this in the future to some persistent stored ID for use across projects, we can change it later without it being a breaking change.

hackergrrl commented 4 years ago

@gmaclennan, try the v8-peer-refactor branch of mapeo-core. I tested with Mapeo Desktop and it seems to have fixed the issue.

digidem / mapeo-core

Flickering peers in v8 #84