art-daq / artdaq_daqinterface

Other
0 stars 1 forks source link

DAQInterface should handle SIGHUP, SIGTERM, etc. as gracefully as possible #50

Closed eflumerf closed 2 years ago

eflumerf commented 2 years ago

This issue has been migrated from https://cdcvs.fnal.gov/redmine/issues/22146 (FNAL account required) Originally created by @jcfreeman2 on 2019-03-15 16:26:53


This Issue is motivated by Eric's Redmine Issue 22095, in which he observed that sometimes when using run_demo.sh not all artdaq processes (especially datalogger) appeared to get cleaned up. Perhaps related, running the demo config with component01 and component02 on woof, what I've found is that when it's in the running state, whether DAQInterface is controlling processes in "pmt" or "direct" mode, if it receives a SIGHUP or a SIGTERM then while the python script that is DAQInterface disappears, the artdaq processes (and in the case of "pmt" mode, pmt.rb) remain. While it's the case that if you relaunch and then try to run again with the same processes, DAQInterface will clean up the processes after complaining and then put itself back in the "stopped" state, there's of course no guarantee in the real world that this action will be taken subsequent to an unexpected DAQInterface killing. The possibility of DAQInterface catching kill signals and then gracefully winding down active artdaq processes should be investigated.

eflumerf commented 2 years ago

Comment by @jcfreeman2 on 2019-03-24 04:00:40


Resolved at the head of the feature/issue22146_handle_signals branch, commit a9bbd7dcaa1a27c09798f88ef4d0c1a7f82b9576.

DAQInterface will now enter the recover transition (i.e., sending a stop and then a shutdown to artdaq processes found in the running state before killing them, most notably resulting in a correctly-saved root file) if it receives any of the following signals:

-SIGINT, meaning that if DAQInterface is running in the foreground in a terminal and you hit Ctrl-c -SIGHUP, meaning you close the terminal DAQInterface is running in -SIGTERM, meaning you kill DAQInterface (by ignoring the are-you-sure warning you get when DAQInterface isn't in the "stopped" state but you try killing it via the kill_daqinterface_on_partition.sh script

eflumerf commented 2 years ago

Comment by @eflumerf on 2019-03-25 16:42:03


I've noticed a few cases where closing the DAQInterface window has led to a python process remaining active, with /tmp/daqitnerface-$USER/DAQInterface_partition*.log showing no activity. I'm not sure what the workaround might be, other than making sure to proceed with default handlers after running the DAQInterface signal handler...

def_term_handler = signal.SIG_DFL def_hup_handler = signal.SIG_DFL def_int_handler = signal.SIG_DFL

...

--- sys.exit(1) +++ if signum == signal.SIGTERM +++ def_term_handler(signum, stack) +++ else if signum ...

...

def_term_handler = signal.signal(signal.SIGTERM, handle_kill_signal) ...

eflumerf commented 2 years ago

Comment by @jcfreeman2 on 2019-03-25 22:23:22


To address Eric's findings, with commit 3d6d95ef1a9b10169285b8bab25de68f2e024752 on feature/issue22146_handle_signals, after putting itself through the recover transition, DAQInterface will then call the default signal handler, and then as an insurance policy call os._exit, which is a harder exit than sys.exit.

eflumerf commented 2 years ago

Comment by @eflumerf on 2019-03-28 18:39:11


I've tried closing the window at several mid-transition and between transitions, and now no longer see the issue. Code review looks good. Merged into develop.