farcasterxyz / hub-monorepo

Implementation of the Farcaster Hub specification and supporting libraries for building applications on Farcaster
https://www.thehubble.xyz
MIT License
714 stars 417 forks source link

bug (hubble): Hub out of sync #2332

Closed flyingrabbit-lab closed 3 weeks ago

flyingrabbit-lab commented 2 months ago

What is the bug? My hub has missing messages although grafana shows 100% synced. I have tried everything on the troubleshooting page. Random messages (cast, reactions) are missing.

This is my grafana dashboard:

image image image image

Reseting the db works, but then after some time it has missing messages again.

AX1S99 commented 2 months ago

i think it is because of logs. the node stores 2 gp logs. so when it is showing 100%, may be it is considering the success logs

flyingrabbit-lab commented 1 month ago

i think it is because of logs. the node stores 2 gp logs. so when it is showing 100%, may be it is considering the success logs

What can I do about it, to not be out os sync?

Z3R013x commented 1 month ago

Exactly same bug on my side too... it was working just fine, few days ago I had to reinstall OS on server, after installing node via script that runs it via docker, this annoying bug occurs, tried resetting DB, pruning docker & all volumes and install node again, reinstalled OS again, nothing works :/ even on fresh install I get 1-2 peers, after some time I get 0 peers and no gossips.

alexchenzl commented 1 month ago

Exactly same bug on my side too... it was working just fine, few days ago I had to reinstall OS on server, after installing node via script that runs it via docker, this annoying bug occurs, tried resetting DB, pruning docker & all volumes and install node again, reinstalled OS again, nothing works :/ even on fresh install I get 1-2 peers, after some time I get 0 peers and no gossips.

Same issue happens on my server.

0x330a commented 3 weeks ago

Same thing happens to me. logs show errors when trying to connect to all bootstrap peers

sds commented 3 weeks ago

Could you share some example error messages?

In the original screenshot for this issue (the middle of the Grafana dashboard) it shows there were no inbound gossip connections. You can't discover peers if they can't communicate with you.

This is likely an error with your local or cloud provider networking, as we are not seeing any issues with our production hubs.

Also, please make sure you are running the latest hub version.

0x330a commented 3 weeks ago

running 1.16.2 trying to figure out what's going wrong or if it just needs more time.

bootstrap peers can be discovered and sometimes connected to but never receive inbound connections (probably fine) and after a few hours the connected peers seems to drop to 0, all message gossip seems to stop after that point.

the log messages also say that it doesn't run the sync health job because the first sync hasn't been completed.

Going to leave it running to see if more time helps

sds commented 3 weeks ago

If the peers can't connect to you they can't sync. Incoming gossip not working (Grafana dashboard error above) indicates that your hub is not reachable on the public internet, likely because it's behind a NAT gateway.

We highly recommend avoiding running a hub yourself and using a provider (such as Neynar) to expose a hub API for you.

I'm going to close this, but if you have more information that changes the story, let's open a new ticket and discuss there. Thank you!

Z3R013x commented 3 weeks ago

TCP ports required for server is reachable, can reach open ports externally without any issue, new nodes are experiencing given issue, mine was also working fine before I reinstalled it, that's why you don't see issue on your production hubs.

@sds

sds commented 3 weeks ago

I just redeployed our production hubs. We're not seeing this issue. There must be something specific about the way you're running your hubs.

Remember, both ports 2282 and 2283 need to be reachable from the public internet. It's possible your provider might be blocking traffic, or something else unexpected and outsider of Hubble is occurring.

If you'd like to receive more support on this, open a new ticket and:

We strongly suggest running hubs using a provider like Neynar for just this reason. Hubs are sometimes challenging to run, and given our current efforts to migrate to Snapchain we likely aren't going to be making much investment on the current hub implementation until after that migration is complete, since it solves many known problems (mostly performance-related) with sync.

I'm going to lock this thread. Thank you for your understanding.