Open ZubairLK opened 5 years ago
Here is a graph of cpu usage using telegraf/influxdb/grafana
balenaOS cpu usage is spiky. Unless I'm mistaken that is due to the various healthchecks (supervisor/balenad being the most cpu intensive ones probably)
We can investigate lighter-weight health-checks or perhaps make the healthcheck frequency user configurable
We don't want to make the healthchecks user configurable. If there is an issue with the healthchecks we should fix those.
We know that the current engine healthcheck also causes wear to storage media, so we would like to replace that with something more like a status check.
However, we still need some larger solution to check system overall health (like device-diagnostics) but on the device and capable of automatic recovery steps.
An old spec that is similar can be found here: https://github.com/balena-io/balena-io/pull/2009
balenaOS has quite a few health-checks on various systemd services. On slow devices like the pi0, these healthchecks can eat valuable cpu cycles. It would be wiser to make these health checks configurable.
Found while looking into https://github.com/balena-os/meta-balena/issues/1396