api3dao / signed-api

A monorepo for managing signed data
MIT License
1 stars 8 forks source link

Batch Airnode feed calls to Signed API by small buffer #212

Open Siegrift opened 7 months ago

Siegrift commented 7 months ago

Currently, Nodary Airnode feed makes ~1800 per second Signed API calls (counting only those that reach our server). These numbers are similar for many other Airnode feeds. This is because the non-batched feeds make single API call per data feed and after the data is fetched it is pushed to Signed API. Put shortly, we have as many Signed API POST requests as we have data provider requests.

There are these issues:

  1. These requests are randomly distributed, but there can still be too many of them at once, possibly causing https://github.com/api3dao/signed-api/issues/211
  2. Causes too many logs in Signed APIs, relating to https://github.com/api3dao/signed-api/issues/208
  3. Causing unnecessary HTTP request overhead for both Airnode feed and Signed API.

What I propose is to batch the Airnode feed calls by a short period (e.g. 50ms or 100ms). We want to push the data as soon as possible, but such a short delay seems acceptable to me. This sets the upper bound on the number of Signed API calls requests made per second.

If we do this, we might not want to remove the "Request received" message in this issue, but instead log it with how many "signed datas" it received in the batch.

@bbenligiray let me know wdyt.

bbenligiray commented 7 months ago

What are the expected batch sizes with a 50-100 ms period?

Currently, Nodary Airnode feed makes ~1800 per second Signed API calls

This surprises me, considering that Nodary Airnode feed batches all of its calls. Maybe you included the Nodary signed API logs?

Siegrift commented 7 months ago

Nodary Airnode feed makes ~1800 per second Signed API calls

Sorry. I mistyped. It's 1800 per MINUTE not second. This means it makes 30 calls every second. But when you check the link, you can see that the distribution is not really uniform.

This surprises me, considering that Nodary Airnode feed batches all of its calls.

I think you are looking at the wrong file. I believe the actual deployment is this file. I spoke to @metobom about this and he mentioned he wants to switch it, but there is some stuff he needs to think about first. I also see 5s fetchInterval, not sure if that shouldn't be 1.

Maybe you included the Nodary signed API logs?

Yeah, I was looking at Signed API logs on Grafana, but I mistyped "minute" instead of "seconds". The idea still stands though.

What are the expected batch sizes with a 50-100 ms period?

Say we have 300 feeds, fetchInterval 1 second, none of the calls is batched (300 api calls) and our staggering distributes them equally. Normally we would make a call with 1 signed data request every ~3.3ms (call this frequency). With the change the batch size would be delay_period_ms / frequency_ms and we would send a request at most every 1000 / delay_period_ms. So for 50ms period, the batch size is ~15 and we would make at most 20 calls.

Siegrift commented 7 months ago

This would also help with reducing the Signed API costs reported here.

bbenligiray commented 7 months ago

I think 300 feeds with 1 second fetchInterval is a bit extreme (in that most API keys won't support that). If we think about 150 feeds with 5 second fetchInterval (which is much more realistic), that's a request every 33ms. I think this scheme starts becoming worth it starting at 200ms batching periods.

I assume this is a future thing to implement, right?

Siegrift commented 7 months ago

I think this scheme starts becoming worth it starting at 200ms batching periods.

I am fine with 200ms as well (or any period we agree on). I quite like the idea of having the upper bound on the Signed API calls.

I assume this is a future thing to implement, right?

Yeah, but as I mention in

If we do this, we might not want to remove the "Request received" message in https://github.com/api3dao/signed-api/issues/208, but instead log it with how many "signed datas" it received in the batch.

Because both Airnode feed INFO logs for outgoing Signed API requests and Signed API logs for incoming requests would be decreased.

bbenligiray commented 7 months ago

Logging every 200ms is still a lot though. Decreasing the logs is not meaningful if it doesn't decrease it to an acceptable level.

Siegrift commented 7 months ago

Logging every 200ms is still a lot though.

Is it though? The message {"level":"info","message":"Received request \"POST /:airnodeAddress\".","ms":"+7ms","timestamp":"2024-02-02T20:49:28.481Z"} has 123 bytes. Doing 123 (bytes) * 10 (assuming 10 request per second = 100ms period) * 60 * 60 * 24 * 30 = 3188160000B = 3.18816 GB. If we have 10 APIs, thats ~31.9 GB monthly which is 15 $.

bbenligiray commented 7 months ago

I'll actually propose logging more on the next call :^)

Siegrift commented 6 months ago

I'd suggest the following implementation:

  1. There is going to be a separate loop in Airnode feed for pushing the data, that runs every 100ms.
  2. When scheduling new data push here, we would instead save what to update in signedApiUpdates array in state.
  3. The loop gets all elements from signedApiUpdates and clears the array. It then posts the signed data to the Signed API.

Let's sync on this on the call.

bbenligiray commented 6 months ago

I would keep this in the backlog even if there is no intention to implement it

Siegrift commented 5 months ago

As part of investigating Signed API errors from Airnode feeds, I looked at the Grafana logs for. For context, at the time of writing, in the last 12 hours, the following number of errors were produced:

This is not that high considering the number of running instances and the time range, but this feature could help reducing this number.