v2 sr_audit starts instance 0 when instances >= 100

MetPX / sarracenia

https://MetPX.github.io/sarracenia

GNU General Public License v2.0

45 stars 22 forks source link

v2 sr_audit starts instance 0 when instances >= 100 #1183

Closed petersilva closed 1 week ago

petersilva commented 2 months ago

in v2, when running foreground, it will create a .pid file for instance 0 as well as for the other instances (>=1.) sr_audit looks for all the .pid files, and since the foreground instance is "missing" will re-start it. instance 0 should never be restarted.

work around:

rm the pid file for instance 0.
kill the instance zero process.
when stopping foreground processes... check for left-over pid file, and remove to prevent recurrence.

petersilva commented 1 month ago

On further investigation it is determined that instance 0 is being started because:

instance numbers are two digits, and they over flow when
config file has instances 100 in it.

workaround: reduce instances to lower number. e.g. instances 75

petersilva commented 1 month ago

tried to reproduce this with sr3, and it works fine, issue not present at all in sr3:

three digit instance files are created.
sr3 status does not report any missing or strays.
sr3 sanity does not destroy or start any instances.

tested at 120 and it seems fine.

petersilva commented 1 month ago

OK did find some weird behaviour after I kill instance 100 in sr3, not the same as v2. made a patch to correct it.

reidsunderland commented 1 month ago

The fix in #1226 caused a problem :(

 Traceback (most recent call last):
   File "/local/home/sarra/.local/bin/sr3", line 11, in <module>
   load_entry_point('metpx-sr3', 'console_scripts', 'sr3')()
 File "/local/home/sarra/sr3/sarracenia/sr.py", line 3074, in main
    gs = sr_GlobalState(cfg, cfg.configurations)
  File "/local/home/sarra/sr3/sarracenia/sr.py", line 1323, in __init__
     self._read_states()
   File "/local/home/sarra/sr3/sarracenia/sr.py", line 556, in _read_states
     self._read_state_dir()
   File "/local/home/sarra/sr3/sarracenia/sr.py", line 493, in _read_state_dir
    i = int(pathname[0:-4].split('_')[-1])

Edit: I realized this issue is specific to v2, I'm talking about sr3 here.

reidsunderland commented 1 month ago

It's getting stuck on a cpost:

/local/home/sarra/.cache/sr3/cpost/config_name/i01.pid

Is it normal for cpost pid files to be named iXX.pid?

Edit: I realized this issue is specific to v2, I'm talking about sr3 here.

petersilva commented 1 month ago

uh... yeah... but I fixed in sarrac also: https://github.com/MetPX/sarrac/pull/161