Open aaroncox opened 2 years ago
Actually I think returning all errors for all APIs would be problematic. Monitoring needs access to the "get info" API at all times. Maybe health API instead?
/v1/health: return 200-OK if chain less than the user configured interval, error otherwise.
this haproxy checker is doing exactly that, and turns off the backend if it's over 60 seconds behind:
Actually I think returning all errors for all APIs would be problematic. Monitoring needs access to the "get info" API at all times. Maybe health API instead?
/v1/health: return 200-OK if chain less than the user configured interval, error otherwise.
Which APIs would be problematic if blocked while a node was syncing?
FWIW I imagine this behavior would be an optional flag you could use on a node, like disable-api-when-behind = 60
or something. By default, nodeos would remain backwards compatible and act as it does today.
We'd just end up using this flag on public API nodes, since if the node is behind, we don't want it to service any traffic.
this haproxy checker is doing exactly that, and turns off the backend if it's over 60 seconds behind:
An operator shouldn't have to add haproxy into their stack in order to achieve this. If an operator wants to use nginx, caddy, traefik, cloudflare, aws elb monitoring, etc - they should still be able to achieve this.
Right now with most of the other solutions it'd require custom middleware specific to the software (like you've provided). All of them natively know how to remove an invalid upstream from a pool if they are throwing errors. Doing it in the core software accomplishes this.
Monitoring systems like to know where the chain is at when it is syncing old blocks (ie get_info). If there is another API that can provide this information which is not down when the main API is down, then that is fine.
Other APIs like producer API, db size etc should run on different ports so a proxy is not required to expose some APIs while not others. (ie while syncing old blocks, I still want to be able to use "cleos net connect ...")
I am not positive on the best solution for this, but I'd like to see some sort of option that can be enabled to block API access to a node if the head is too far behind the current head/time.
This could be either:
When APIs are "blocked" in this fashion, a 503 response to indicate that the server is not yet ready to service requests.
The problem it's seeking to solve is that most generic load balancing solutions will deem an upstream as "valid" if it's returning a HTTP status of 200. While nodeos is syncing, it'll return HTTP status 200 responses from the API even while syncing. Any request served by this API will be outdated, and any tapos values created from this server will also be considered invalid (expired) if someone tries to submit a transaction using them.