balena-os / meta-balena

A collection of Yocto layers used to build balenaOS images
https://www.balena.io/os
968 stars 115 forks source link

Improve healthchecks #1397

Open ZubairLK opened 5 years ago

ZubairLK commented 5 years ago

balenaOS has quite a few health-checks on various systemd services. On slow devices like the pi0, these healthchecks can eat valuable cpu cycles. It would be wiser to make these health checks configurable.

Found while looking into https://github.com/balena-os/meta-balena/issues/1396

ZubairLK commented 5 years ago

Here is a graph of cpu usage using telegraf/influxdb/grafana

image

balenaOS cpu usage is spiky. Unless I'm mistaken that is due to the various healthchecks (supervisor/balenad being the most cpu intensive ones probably)

ZubairLK commented 5 years ago

We can investigate lighter-weight health-checks or perhaps make the healthcheck frequency user configurable

klutchell commented 2 years ago

Related to https://github.com/balena-os/meta-balena/issues/2423

klutchell commented 2 years ago

We don't want to make the healthchecks user configurable. If there is an issue with the healthchecks we should fix those.

We know that the current engine healthcheck also causes wear to storage media, so we would like to replace that with something more like a status check.

However, we still need some larger solution to check system overall health (like device-diagnostics) but on the device and capable of automatic recovery steps.

An old spec that is similar can be found here: https://github.com/balena-io/balena-io/pull/2009