digidem / mapeo-core

Library for creating custom geo data and syncronizing via a peer to peer network
23 stars 2 forks source link

Fix/missing data bug #100

Closed okdistribute closed 3 years ago

okdistribute commented 3 years ago

Waded through this bug today -- found it while testing the missing data behavior.

When a sync with a peer with missing data occurred, and then you synced again without exiting mapeo on either side, it would report 'tried to sync before handshake occurred' OR if it got past that, it would crash node because peer.state.message.db did not exist as it was in the STARTED state.

hackergrrl commented 3 years ago

When a sync with a peer with missing data occurred, and then you synced again without exiting mapeo on either side, it would report 'tried to sync before handshake occurred' OR if it got past that, it would crash node because peer.state.message.db did not exist as it was in the STARTED state.

Which part of this PR fixes this? Is it the extra updateFeed() call to push the peer into the PROGRESS state? Did you get any clarity on why progress events weren't firing in the sparse case?

okdistribute commented 3 years ago

@noffle the updateFeed call does not fix this, although I wish it did.

This fixes it in the short term:

    if (peer.state.topic === ReplicationState.STARTED) return true
    if (peer.state.topic === ReplicationState.ERROR) return false

This is a hack though, because the progress events are never emitting (this is why I said updateFeeds() progress event isn't getting propagated.) So the peer is stuck in the started state, as there is nothing to update (it just synced!) -- but it is /also/ not done syncronizing since the hypercore feed is open. So it just stays stuck in the STARTED state. The heartbeat then needs to know if it's stale to destroy the connection.

Ideally, it would have at least one progress event instead.

okdistribute commented 3 years ago

@noffle I was able to remove this hack by moving the progress event listeners to begin on sync-start rather than on handshake accept.

okdistribute commented 3 years ago

The test is still failing due to a race condition on windows, but I've decided to kick the can down the road again. See here for my reasoning