balena-os / balena-engine

Moby-based Container Engine for Embedded, IoT, and Edge uses
https://www.balena.io
Apache License 2.0
694 stars 66 forks source link

Leftover healthcheck processes from engine crash loops #280

Open cywang117 opened 2 years ago

cywang117 commented 2 years ago

When the engine is not allowed time to gracefully stop, its leftover processes become orphaned from what I can tell and do not get cleaned up. A possible fix for this is to have the engine attempt to clean up leftover processes on start.

Description

Steps to reproduce the issue: 1. 2. 3.

Describe the results you received:

Describe the results you expected:

Additional information you deem important (e.g. issue happens only occasionally):

Output of balena-engine version:

(paste your output here)

Output of balena-engine info:

(paste your output here)

Additional environment details (device type, OS, etc.):

jellyfish-bot commented 2 years ago

[cywang117] This issue has attached support thread https://jel.ly.fish/aa791470-bc0b-4694-b4a6-07998b249870

lmbarros commented 2 years ago

In our experience, many cases (most of them, I dare to say) of the Engine crash-looping were ultimately triggered by Systemd's watchdog killing the Engine. We have vastly improved Engine health checks with this PR: https://github.com/balena-os/meta-balena/pull/2734 , so for this common case, upgrading to balenaOS v2.101.7 shall be a good solution.

I am not closing this issue because it is still relevant for other, rarer cases of Engine crashloops.