Closed sebenste closed 5 years ago
WORKAROUND: Run a script in crontab that checks to see if the LDM is running every minute. If it is not, you can restart the LDM. Unfortunately, this does mean that up to 60 seconds of data could be lost, but it's a lot better than the LDM staying down.
Because the legacy NOAAPort ingestion code for GRIB2 messages is a big ball of mud, a work-around script that restarts a crashed ingester has been created. The script is called keep_running
and will be in the next release.
Here's the script if you want to use it now:
# Script to start a program and restart it whenever it terminates.
while true; do
"$@"
status=$?
ulogger -p error "Process '$@' terminated with status $status. Restarting"
# Edit/uncomment the following command to receive an email notification
# mailx -s "Process '$@' terminated with status $status. Restarting" \
# $LOGNAME </dev/null
done
Simply add the script to the LDM user's bin/
subdirectory.
Here's an example of using it in the LDM configuration-file:
exec "keep_running noaaportIngester -n -m 224.0.1.1 -l /data/tmp/nwstg.log"
Due to the bug fix in the upcoming LDM 6.13.12, this can be closed.
Because it is a bug in the NOAAPort ingestion code (a program should only crash due to a hardware fault) I'd rather keep it open -- as a reminder if nothing else.
Regards, Steve Emmerson
On Sun, Jan 19, 2020 at 8:24 PM Gilbert Sebenste notifications@github.com wrote:
Due to the bug fix in the upcoming LDM 6.13.12, this can be closed.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/Unidata/LDM/issues/71?email_source=notifications&email_token=AAEVZ7PBOYFISRLINAWDDZ3Q6UKOLA5CNFSM4I3JX4YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJLHGQI#issuecomment-576090945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEVZ7MMNNZF3A5OUIFWUSLQ6UKOLANCNFSM4I3JX4YA .
As of this day when the issue is being reported, all operational versions of the LDM, up to and including beta version 6.13.12.24, has a bug whereby a badly corrupted GRIB file can cause the NOAAport ingester to crash. That, in turn, causes the rest of the LDM to terminate as well. On 9/25/2019, this bug caused the LDM at every National Weather Service forecast office in the United States to crash after a badly corrupted grib file was uplinked to NOAAport.
It also crashed the LDM at multiple universities as well that receive NOAAport, including the popular College of DuPage website.