[process] Rig will terminate if one pid ends when watching all pids

TurboTurtle / rig

A lightweight, flexible, easy to use system monitoring and event handling utility

GNU General Public License v2.0

10 stars 7 forks source link

[process] Rig will terminate if one pid ends when watching all pids #36

Closed TurboTurtle closed 2 years ago

TurboTurtle commented 3 years ago

User report from CEE SD testing:

Created the rig with two processes running, simulating some load. One of the processes finished and stopped before load went about the target.

However the rig stopped monitoring as soon as the first process died off.

Rig seems to working and this might be an RFE to the functionality but wanted to submit this just in case it is possible to change this behavior.

# rig process --all --process python3 --cpuperc 5 --foreground
Beginning watch of process 39455 for total cpu usage of 5.0% or higher
Beginning watch of process 39439 for total cpu usage of 5.0% or higher

Process 39455 is no longer running, stopping cpu percentage monitor.
No data generated to archive for this rig.

TurboTurtle commented 3 years ago

So at first glance it seems we're a bit too aggressive with killing off the rig. If we're watching multiple PIDs, I would think we should only terminate if all those pids are no longer running (unless of course the rig is watching for a PID to die, so this may get a little tricky with the logic).

TurboTurtle commented 3 years ago

Right... so this is because the watcher thread exits, and our futures pool is waiting for the first job to return.

For the moment, I don't have a good idea of how to address this beyond artificially keeping the watcher thread active. For example, detect that the PID no longer exists and if we have more than one PID to watch spin on wait_loop() instead of hitting the return False line. That doesn't seem too terrible, but it is a bit hacky. My biggest concern would be this combined with another request (issue to be opened), where --all-pids would look for PIDs that started after the rig was deployed. We could potentially get ourselves in a position where we have an absurd number of threads spinning on nothing.