Closed geoffmcl closed 7 years ago
Have a look at the stopwaitsecs
and startsecs
... here http://supervisord.org/configuration.html#program-x-section-settings
Not sure what to do as am not familar with the stop signals for perl..
Also shall I create the supervisor log in /home/geoff/log/*
?? or alike
Hi Pete,
Thanks for the hints, but...
stopwaitsecs
, default 10, seems to be more about
when supervisor
, aka SV
, wants to stop a process.
How long to wait for exit. That does not seem a problem
here...
startsecs
, default 10, seems to be a measure of
how long the process should run before it is considered
running ok... so to me the default of 10 secs is ok...
No, this is about when SV
detects the process has
stopped, and when to restart
it...
As stated, if there is a tiny glitch in the json http fetch, it is very likely that if SV restarts the script immediately, that the problem will still be there...
BUT, if SV waits 2-5 secs before restarting the script, then there is a good chance the very momentary json fetch problem will have cleared itself... that's all...
Is there a way to say to SV - take it easy kid! The process stopped. Leave it dead for 2-5 secs before restarting it... Seems like a reasonable option... but can not find it...
If there is a temporary problem, do not try to start the process immediately... wait some time for the problem to resolve itself... then restart the process...
I do not know how quickly SV detects a process down!
But is SV is very attentive... maybe too attentive... and restarts the process more that 3 times within the next second, then I can see this can be a problem!
As you know, the json string in crossfeed is only updated each seconds... If a json fetch fails, then trying again, several times, before at least 1, or more, seconds have passed, will probably mean more failures... calm down, and wait a little, and all may, will be well...
At some point, I will get around to moving this logic into the script... If a json fetch fails... do not panic, and exit... sleep/wait 5 seconds... and calmly read again... And let this happen say 3 to 10 times before actually bailing...
Yes, I have created a /home/geoff/log folder you can
write a log to, but not sure this is going to solve
the above maybe SV is too fast, too anxious
problem...
And I too am not famiar with the stop signals for perl
,
except I would believe they are very similar across
all script languages - perl, python, etc, etc... But
really not sure how this is relevant...
As stated, I will get around to adding this calmness
to the script itself... some day soon... actually waiting
the the next RPI2 acquisition to break, to exactly
narrow this down... and work on it...
All it seems I want at the moment is for SV
to take a
little time off ;=)) calm down... take a sip of red
wine... a short nap... and then autorestart
the
process...
Regards, Geoff.
Supervisor probably does not do what we want.. It designed to keep processes running, ie zero downtime
Why is the script crashing anyway.. ?
Another quick hack would be to launch from a bash.script and make is sleep for X seconds before kicking off the pl script
Hi Pete,
Ok, understand supervisor
is a tool to keep
maximum up-time, and it seems to do a good
job of that... so no problem
there...
Yes, I too thought of the quick hack to run the pl from a bash script that first slept for a few seconds...
But as stated I am waiting for this to happen again on RPI2... I think I know which pl line to fix, but just want to be sure... and if I am right, should be an easy fix...
See lines - https://github.com/geoffmcl/scripts/blob/master/cfjsonlog.pl#L346-L348
That service fetch_url()
can return an undefined
,
or some truncated, perhaps not valid json
string
and it is in the json decode that bombs...
I could add some checks here, and if not a flights.json
string, just sleep the script for a few seconds, and quietly
try again... for say at least the next minute
or so...
But, as usual, it has been running several
days/weeks without problem... and I am sure
when it breaks in RPI2, it will probably break
in m2
at the same time...
I guess it may go back to a fault
in
crossfeed (cf_client) code, on the rare
occasion, there are no flights to report,
due to the cf packet filtering...
Maybe at that moment cf forms some bad json, or 'nothing', or something like that... need also to look deeper into this...
See service Write_JSON()
, from line 532 -
https://gitlab.com/fgtools/crossfeed/blob/master/src/cf_pilot.cxx#L532
This service establishes the next json string
to return... note it uses pvList->size()
, but
skips expired
pilots... if this is zero, at
that time what exactly does the http json GET
request return?
As stated it seems quite a rare event... but am on its tail ;=))
And as I now feel Mr SuperVisor
is doing his
job, will close this issue... ;=))
Thanks, Geoff.
Ok, it broke again in RPI2. exactly where predicted... BUT, not in m2
!!!!
Maybe the auto restart 1, 2, ... were already in the next crossfeed 1 second cycle...
Made a kludge fix in the pl script, and that is to wait 5 seconds before doing the first fetch... there really is no big rush here... on start up...
Added some more precise timing... just debug stuff...
This is a quick fix, while I look deeper into avoiding it totally ;=))
Also decided to cycle the log to .1
, .2
, ... on restart, to be able to look back... but this presents a log cleanup to be done later...
TODO: Get the CSV files synced... TODO: Add a log cycling system
On to the next steps...
had a quick look at script... and seems strange to be that
@geoffmcl adding issues on crossfeed-logger, re analysis..
Issue #1 is your TODO ie CSV files synced https://github.com/freeflightsim/crossfeed-logger/issues/1
WHAW... I knew I been here before..
@pedromorgan yes, I certainly could add some error return to fetch_url
, and maybe also to the JSON perl package decode
, but haven't yet...
This did start out just as a WIP test to get an updated MODEL use, but has grown since then...
And as mentioned still a TODO to sync the raw CSV files... presently missing 22 and 23, and 24 is truncated... will get to this, hopefully soon ;=))
And I had not forgotten your experimental FG Multi Player Network Bot written in Go
WIP of a few years back... But personally, have no interest in replacing FGMS and/or crossfeed C++ programs with Go
;=))
Hi Pete,
The cflog script was running good, under supervisor, for a few days, so I stopped monitoring it for a few days... and BANG ;=))
It stopped on 2016-08-21, and for some reason
supervisor
did not restart the script ????????When I entered
$ sudo supervisorctl
it showed an error line -Anyway, I restarted
cflog
, and it is now outputtingflights-2016-08-24.csv
... over the coming days I will fill in the missing logs from my RPI2 collection, so no problem...But just a question! Why did
supervisor
not restart thecflog
job?The super config
$ /etc/supervisor/conf.d/cflog.conf
already contains -Is there anything else we need to do to make
supervisor
restart the script if it fails, for any reason...I note the
startretries=3
. If super restarts the script too quickly, maybe the json fetch problem, the only known reason it stops, has not yet recovered...Is there are way to config super to wait say 20 to 25 seconds before the restart... then 3 retries over the next minute or so might make sense...
But am concerned that if super restarts the script virtually immediately, then it may be easy to get 3 similar fetch errors within the first few seconds...
Will keep monitoring the situation...
Regards, Geoff.