MetPX / sarracenia

https://MetPX.github.io/sarracenia
GNU General Public License v2.0
45 stars 22 forks source link

v2 sr_audit starts instance 0 when instances >= 100 #1183

Closed petersilva closed 1 week ago

petersilva commented 2 months ago

in v2, when running foreground, it will create a .pid file for instance 0 as well as for the other instances (>=1.) sr_audit looks for all the .pid files, and since the foreground instance is "missing" will re-start it. instance 0 should never be restarted.

work around:

petersilva commented 1 month ago

On further investigation it is determined that instance 0 is being started because:

workaround: reduce instances to lower number. e.g. instances 75

petersilva commented 1 month ago

tried to reproduce this with sr3, and it works fine, issue not present at all in sr3:

tested at 120 and it seems fine.

petersilva commented 1 month ago

OK did find some weird behaviour after I kill instance 100 in sr3, not the same as v2. made a patch to correct it.

reidsunderland commented 1 month ago

The fix in #1226 caused a problem :(

 Traceback (most recent call last):
   File "/local/home/sarra/.local/bin/sr3", line 11, in <module>
   load_entry_point('metpx-sr3', 'console_scripts', 'sr3')()
 File "/local/home/sarra/sr3/sarracenia/sr.py", line 3074, in main
    gs = sr_GlobalState(cfg, cfg.configurations)
  File "/local/home/sarra/sr3/sarracenia/sr.py", line 1323, in __init__
     self._read_states()
   File "/local/home/sarra/sr3/sarracenia/sr.py", line 556, in _read_states
     self._read_state_dir()
   File "/local/home/sarra/sr3/sarracenia/sr.py", line 493, in _read_state_dir
    i = int(pathname[0:-4].split('_')[-1])

Edit: I realized this issue is specific to v2, I'm talking about sr3 here.

reidsunderland commented 1 month ago

It's getting stuck on a cpost:

/local/home/sarra/.cache/sr3/cpost/config_name/i01.pid

Is it normal for cpost pid files to be named iXX.pid?

Edit: I realized this issue is specific to v2, I'm talking about sr3 here.

petersilva commented 1 month ago

uh... yeah... but I fixed in sarrac also: https://github.com/MetPX/sarrac/pull/161