Sync & discovery inconsistencies

cabal-club / cabal-core

Core database and replication for cabal.

GNU Affero General Public License v3.0

302 stars 43 forks source link

Sync & discovery inconsistencies #72

Closed hackergrrl closed 4 years ago

hackergrrl commented 4 years ago

This issue is vague: I'm still trying to understand what's happening.

Since the early days of cabal I've noticed this general pattern where, as a cabal gets older & bigger on my machine, it seems to discover fewer peers, and hold a connection open with those peers more briefly. Eventually, it seems, I'm not able to really sync at all.

However, when I run cabal --temp $ADDRESS I notice that, generally, I discover more peers, and those connections tend to stay open longer.

hackergrrl commented 4 years ago

I poked at it, and I think this can be fixed with 4f3ca4ad921a48ac678fe2f4b05bb48e0ecae161.

We're likely switching to hyperswarm in the near future, but a backport to the 9.x.x semver could mean that folx who aren't ready for a protocol partition (if there are any such folx out there) would be able to enjoy the bugfix too.

hackergrrl commented 4 years ago

Beyond this, I also notice disconnects & reconnects every ~20s for most/all of my peers. hypercore-protocol defaults to 20s timeout, so that's a good culprit to check on, even though it should be sending KEEPALIVE messages every ~10s to keep the connection alive.

hackergrrl commented 4 years ago

Ok, looks like we'll be moving to hyperswarm much sooner rather than later! I'll reopen this or file something new if I see these issues again: so far hyperswarm is looking good for connectivity.