farcasterxyz / hub-monorepo

Implementation of the Farcaster Hub specification and supporting libraries for building applications on Farcaster
https://www.thehubble.xyz
MIT License
707 stars 401 forks source link

bug(pg-replicator): replicated database has many more messages than hubs. #1160

Closed varunsrin closed 1 year ago

varunsrin commented 1 year ago

What is the bug? @manan19 reported that his replicated pg database has ~2M more messages than the Hub he is connected to.

How can it be reproduced? (optional) Include steps, code samples, replits, screenshots and anything else that would be helpful to reproduce the problem.

varunsrin commented 1 year ago

@sds lets discuss priorities at the next sync, @manan19 will add context in the meantime

manan19 commented 1 year ago

Neynar Hub #1 has 2.56M messages. Postgres populated by replicator has 2.81M messages.

1 correction to this statement in case it matters

has ~2M more messages than the Hub he is connected to.

The replicator is not subscribed to the Neynar Hub #1 but to nemes.farcaster.xyz. My assumption is that nemes has the same number of messages as other Hubs.

Postgres history

Neynar Hub #1 history Freshly synced from 0 state as of 2 days ago.

sds commented 1 year ago

@manan19 quick question: when you are counting these messages in the messages table, are you excluding messages with a non-null value for any of the deleted_at, revoked_at, or pruned_at columns?

The replicator implementation performs soft deletions (so that foreign key constraints will continue to work, should you choose to use them), so select count(*) from messages will be a larger value than the message count returned by the getInfo({ dbStats: true } call for hubs.

Might not explain the size of gap you are seeing, but want to rule that out. Thanks!

manan19 commented 1 year ago

@sds No, I haven't excluded any messages. image

sds commented 1 year ago

If you run select count(*) from messages where deleted_at is null and pruned_at is null and revoked_at is null, how close is that number to the number of messages on your hub? If there's a large drift, might be something wrong with how it's merging messages or detecting conflicts.

manan19 commented 1 year ago

image

{"level":30,"time":1689890343279,"pid":83,"hostname":"2067bd69f7b6","msg":"Hub Version: 1.4.1 Messages: 2609307 FIDs: 16531 FNames: 16531}"}

With that query, it seems quite close but not exactly the same

varunsrin commented 1 year ago

@manan19 looks like you're off by ~ 5,000 messages

are you able to pull more context on those messages? interested what types of messages they are, and what timestamps they have, and what their state on the hub is.

varunsrin commented 1 year ago

closing due to inactivity