Closed petersilva closed 1 week ago
On further investigation it is determined that instance 0 is being started because:
workaround: reduce instances to lower number. e.g. instances 75
tried to reproduce this with sr3, and it works fine, issue not present at all in sr3:
tested at 120 and it seems fine.
OK did find some weird behaviour after I kill instance 100 in sr3, not the same as v2. made a patch to correct it.
The fix in #1226 caused a problem :(
Traceback (most recent call last):
File "/local/home/sarra/.local/bin/sr3", line 11, in <module>
load_entry_point('metpx-sr3', 'console_scripts', 'sr3')()
File "/local/home/sarra/sr3/sarracenia/sr.py", line 3074, in main
gs = sr_GlobalState(cfg, cfg.configurations)
File "/local/home/sarra/sr3/sarracenia/sr.py", line 1323, in __init__
self._read_states()
File "/local/home/sarra/sr3/sarracenia/sr.py", line 556, in _read_states
self._read_state_dir()
File "/local/home/sarra/sr3/sarracenia/sr.py", line 493, in _read_state_dir
i = int(pathname[0:-4].split('_')[-1])
Edit: I realized this issue is specific to v2, I'm talking about sr3 here.
It's getting stuck on a cpost:
/local/home/sarra/.cache/sr3/cpost/config_name/i01.pid
Is it normal for cpost pid files to be named iXX.pid?
Edit: I realized this issue is specific to v2, I'm talking about sr3 here.
uh... yeah... but I fixed in sarrac also: https://github.com/MetPX/sarrac/pull/161
in v2, when running foreground, it will create a .pid file for instance 0 as well as for the other instances (>=1.) sr_audit looks for all the .pid files, and since the foreground instance is "missing" will re-start it. instance 0 should never be restarted.
work around: