OriginProtocol / storypoints

Story points
0 stars 0 forks source link

Health endpoint showing lag #5

Open mikeshultz opened 1 year ago

mikeshultz commented 1 year ago

Getting some alerts where the /health endpoint is

{
  "diff": 3840,
  "healthy": false,
  "latest": 1687297790,
  "latestHuman": "2023-06-20T21:49:50.000Z",
  "reservoir": 1687301630,
  "reservoirHuman": "2023-06-20T22:53:50.000Z",
}

Since /health uses the same endpoint as ingestion, this implies that ingestion is behind what the endpoint is returning by 64 minutes.

mikeshultz commented 1 year ago

I think this specific case correlates with a processing delay by the Reservoir indexer. I pinged them earlier today about orders not showing up after ~45 minutes. They kicked off "backfilling" and activities started coming through. So either ingestion was in process or not called during this 8 minute window (5 minutes being check delay, so effectively 3 minutes). This seems like a reasonable delay for processing a large amount of events after a long delay on Reservoir's part.

This may not be the case for all of these test failures. Will see if I can correlate anything else.

mikeshultz commented 1 year ago

Another case where Reservoir was returning 500s.

I also added some instrumentation to workerHandler to show duration. It's possible a lot of activities coming in at once may cause a processing delay. Not really sure what the range of execution durations are currently.