eosnetworkfoundation / mandel

Obsolete. Use https://github.com/AntelopeIO/leap instead.
Other
87 stars 27 forks source link

Prevent APIs from being accessed when syncing/catching up #65

Open aaroncox opened 2 years ago

aaroncox commented 2 years ago

I am not positive on the best solution for this, but I'd like to see some sort of option that can be enabled to block API access to a node if the head is too far behind the current head/time.

This could be either:

  1. The API plugins polling the current state of the chain and ensuring the head block is within X seconds/blocks of what it should be and block API access when appropriate.
  2. The chain controller (or something similar) detecting that a block hasn't been received within X seconds/blocks and then blocks access to the APIs.

When APIs are "blocked" in this fashion, a 503 response to indicate that the server is not yet ready to service requests.

The problem it's seeking to solve is that most generic load balancing solutions will deem an upstream as "valid" if it's returning a HTTP status of 200. While nodeos is syncing, it'll return HTTP status 200 responses from the API even while syncing. Any request served by this API will be outdated, and any tapos values created from this server will also be considered invalid (expired) if someone tries to submit a transaction using them.

aaroncox commented 2 years ago

Related: https://github.com/EOSIO/eos/issues/4292

matthewdarwin commented 2 years ago

Actually I think returning all errors for all APIs would be problematic. Monitoring needs access to the "get info" API at all times. Maybe health API instead?

/v1/health: return 200-OK if chain less than the user configured interval, error otherwise.

cc32d9 commented 2 years ago

this haproxy checker is doing exactly that, and turns off the backend if it's over 60 seconds behind:

https://github.com/cc32d9/eosio-haproxy

aaroncox commented 2 years ago

Actually I think returning all errors for all APIs would be problematic. Monitoring needs access to the "get info" API at all times. Maybe health API instead?

/v1/health: return 200-OK if chain less than the user configured interval, error otherwise.

Which APIs would be problematic if blocked while a node was syncing?

FWIW I imagine this behavior would be an optional flag you could use on a node, like disable-api-when-behind = 60 or something. By default, nodeos would remain backwards compatible and act as it does today.

We'd just end up using this flag on public API nodes, since if the node is behind, we don't want it to service any traffic.

aaroncox commented 2 years ago

this haproxy checker is doing exactly that, and turns off the backend if it's over 60 seconds behind:

https://github.com/cc32d9/eosio-haproxy

An operator shouldn't have to add haproxy into their stack in order to achieve this. If an operator wants to use nginx, caddy, traefik, cloudflare, aws elb monitoring, etc - they should still be able to achieve this.

Right now with most of the other solutions it'd require custom middleware specific to the software (like you've provided). All of them natively know how to remove an invalid upstream from a pool if they are throwing errors. Doing it in the core software accomplishes this.

matthewdarwin commented 2 years ago

Monitoring systems like to know where the chain is at when it is syncing old blocks (ie get_info). If there is another API that can provide this information which is not down when the main API is down, then that is fine.

Other APIs like producer API, db size etc should run on different ports so a proxy is not required to expose some APIs while not others. (ie while syncing old blocks, I still want to be able to use "cleos net connect ...")