Currently, the node status reported by the docker image is always unhealthy because of the following reasons. The PR fixes the health check procedure.
Health check procedure sends state queries to a static IP address.
The PR suggests using the loopback where the node listens in all modes.
Querying query.parachains.heads always returns an error
Error: Cannot find query.parachains, your chain does not have the parachains pallet exposed in the runtime\n at assert (/usr/lib/node_modules/@polkadot/api-cli/node_modules/@polkadot/util/cjs/assert.js:31:11)
This PR suggests using query.system.number instead to consider a node unhealthy in case the node does not receive or produce new blocks.
The idea of considering the node as unhealthy when it stays on the same block can be discussed because it may produce an unwanted behavior. For instance, RPC nodes may become unreachable during the events of consensus failure when the network may stop blocks producing too. But this is not a topic of discussion here while query.system.number implements the same idea as query.parachains.heads used before.
Dockerfile directive deletes directories with system binaries where it tries to copy healthcheck.sh later. Also, the script requires some system utils to work.
The PR suggests preserving /usr/bin and /usr/sbin to let the health check work. It may make security concerns, but further improvements should be implemented in order to let the health check procedure work without system utilities. The accompanying growth of the image size from 744M to 880M does not look relatively large.
Currently, the node status reported by the docker image is always
unhealthy
because of the following reasons. The PR fixes the health check procedure.Health check procedure sends state queries to a static IP address.
The PR suggests using the loopback where the node listens in all modes.
Querying
query.parachains.heads
always returns an errorThis PR suggests using
query.system.number
instead to consider a node unhealthy in case the node does not receive or produce new blocks.The idea of considering the node as unhealthy when it stays on the same block can be discussed because it may produce an unwanted behavior. For instance, RPC nodes may become unreachable during the events of consensus failure when the network may stop blocks producing too. But this is not a topic of discussion here while
query.system.number
implements the same idea asquery.parachains.heads
used before.Dockerfile directive deletes directories with system binaries where it tries to copy
healthcheck.sh
later. Also, the script requires some system utils to work.The PR suggests preserving
/usr/bin
and/usr/sbin
to let the health check work. It may make security concerns, but further improvements should be implemented in order to let the health check procedure work without system utilities. The accompanying growth of the image size from744M
to880M
does not look relatively large.