Inconsistent Data Sync Delays

ooohminh commented 4 days ago

Describe the Bug
We're monitoring wildcard events, specifically ERC-721 transfer events. We are occasionally experiencing delays in the data sync process. Most of the time, data is synced correctly, but sometimes it falls out of sync. A recent example of an unsynced state looks like this, persisting for a few hours:

end_of_block_range_scanned_data: 5,557,123
entity_history: 5,551,931
Difference: 5,192 blocks

At one point, the block difference between end_of_block_range_scanned_data and entity_history reached 20,000 blocks, but it eventually dropped to 0 (synced). This behavior occurs inconsistently and causes disruptions in expected performance.

To Reproduce
Steps to reproduce the behavior:

Clone and run the repo: envio-nft-transfer
Monitor the sync state over time - delays occur occasionally, making them hard to predict.

Expected Behavior
The entity_history should consistently stay in sync with the end_of_block_range_scanned_data.

Local Environment Details

Envio: 2.4.4
pnpm: 9.11.0
Node.js: v20.17.0
Docker: 24.0.7, build afdd53b
Server: Ubuntu 20.04.4 LTS (Focal Fossa)

Environment Variables

ENVIO_PG_HOST=localhost
ENVIO_PG_PORT=5432
ENVIO_PG_USER=your_username
ENVIO_POSTGRES_PASSWORD=your_password
ENVIO_PG_DATABASE=your_database

RABBITMQ_URL=amqp://user:password@localhost:5672
HASURA_GRAPHQL_DATABASE_URL=postgresql://your_username:your_password@localhost:5432/your_database

LOG_STRATEGY="console-pretty"
LOG_LEVEL=trace
TUI_OFF="true"

Additional Context

This issue impacts operational efficiency as the block difference occasionally grows significantly, causing delays in data updates
Observations suggest the sync state fluctuates without a clear pattern.

JonoPrest commented 4 days ago

Hey @ooohminh,

To give you some context. There are number of concurrent processes going in the indexer.

Expected Behavior The entity_history should consistently stay in sync with the end_of_block_range_scanned_data.

The retrieving of events and the processing of events are two separate processes values in your end_of_block_range_scanned_data will not necessarily be exactly the same as values actually processed at any given moment. Where as data from the event_sync_state should be completely up to date with your entity_history for example.

There is also a known issue with large indexers using rollback_on_reorg mode (on by default) where there is a very expensive postgres function that backs up all your entities into the history table as it reaches the head. If this is taking a lot of time then it likely would make the process lag behind until this copy finishes. But from then onwards it should remain fresh and you shouldn't see that happen again. This is being worked on and should appear in a release this week to remove the slow copying of the data.

Could you let me know if it's only happening once during an indexing process or does it continue to be a problem after it reaches the head and shows "synced" status?

Also are you running this locally or in our hosted service? If it's deployed with our hosted service could you share links to the deployments causing problems?

ooohminh commented 1 day ago

Could you let me know if it's only happening once during an indexing process or does it continue to be a problem after it reaches the head and shows "synced" status?

It synced the first time, but then it continued to happen again multiple times.

Also are you running this locally or in our hosted service? If it's deployed with our hosted service could you share links to the deployments causing problems?

We're self-hosting it, and I can share the links to the deployments but it's quite private so may I share it via DMs or something when the unsynced problem happens again? It's not happening right now but it has been a problem for multiple times now.

JonoPrest commented 1 day ago

We're self-hosting it, and I can share the links to the deployments but it's quite private so may I share it via DMs or something when the unsynced problem happens again? It's not happening right now but it has been a problem for multiple times now.

Please do, feel free to send me a me a message on discord. It's best though if we can keep all publicly shareable info here to keep a history and allow others in the team to take over issues if they need.

enviodev / hyperindex

Inconsistent Data Sync Delays #261