ava-labs / avalanchego

Go implementation of an Avalanche node.
https://avax.network
BSD 3-Clause "New" or "Revised" License
2.13k stars 670 forks source link

[API] Add Subnet-Specific Health Checks #1264

Closed patrick-ogrady closed 1 year ago

patrick-ogrady commented 1 year ago

Although one Subnet on an AvalancheGo node may be unhealthy, operators may still wish to interact with other Subnets running on it. AvalancheGo's existing health check, however, returns unhealthy if any Subnet is unhealthy. This behavior led to an outage in Subnet APIs during this incident even though most Subnets were able to serve queries because API providers prevented a node serving queries if this "global" check failed (as that was the only mechanism they had to gauge health of the underlying node).

We should add a new health check or add an argument to the existing check (https://docs.avax.network/apis/avalanchego/apis/health#healthhealth) that allows for just checking the health of a specific Subnet. This will allow API providers to serve queries to any subset of healthy Subnets on a node.

I don't think we should remove the "global" health check in this change (which still is useful for getting a "full sense" of a node's status).

StephenButtolph commented 1 year ago

We'll need to make sure to add this support for the GET calls as well. Load balancers typically just look for a 200 response, so jsonrpc doesn't work well for them (which is why we added the special GET handling)

ceyonur commented 1 year ago

Need to add docs for that, then I will close.

ceyonur commented 1 year ago

Added to docs: https://docs.avax.network/apis/avalanchego/apis/health#filtering

There is still a pending PR that would filter min connected health checks with subnetIDs: #1358