Unidata / LDM

The Unidata Local Data Manager (LDM) system includes network client and server programs designed for event-driven data distribution, and is the fundamental component of the Unidata Internet Data Distribution (IDD) system.
http://www.unidata.ucar.edu/software/ldm
Other
43 stars 27 forks source link

Ingest of badly corrupted GRIB data can crash NOAAportIngester #71

Closed sebenste closed 5 years ago

sebenste commented 5 years ago

As of this day when the issue is being reported, all operational versions of the LDM, up to and including beta version 6.13.12.24, has a bug whereby a badly corrupted GRIB file can cause the NOAAport ingester to crash. That, in turn, causes the rest of the LDM to terminate as well. On 9/25/2019, this bug caused the LDM at every National Weather Service forecast office in the United States to crash after a badly corrupted grib file was uplinked to NOAAport.

It also crashed the LDM at multiple universities as well that receive NOAAport, including the popular College of DuPage website.

sebenste commented 5 years ago

WORKAROUND: Run a script in crontab that checks to see if the LDM is running every minute. If it is not, you can restart the LDM. Unfortunately, this does mean that up to 60 seconds of data could be lost, but it's a lot better than the LDM staying down.

semmerson commented 5 years ago

Because the legacy NOAAPort ingestion code for GRIB2 messages is a big ball of mud, a work-around script that restarts a crashed ingester has been created. The script is called keep_running and will be in the next release.

Here's the script if you want to use it now:

# Script to start a program and restart it whenever it terminates.

while true; do
    "$@"
    status=$?
    ulogger -p error "Process '$@' terminated with status $status. Restarting"

    # Edit/uncomment the following command to receive an email notification
    # mailx -s "Process '$@' terminated with status $status. Restarting" \
    #       $LOGNAME </dev/null
done

Simply add the script to the LDM user's bin/ subdirectory.

Here's an example of using it in the LDM configuration-file:

exec    "keep_running noaaportIngester -n -m 224.0.1.1  -l /data/tmp/nwstg.log"
sebenste commented 4 years ago

Due to the bug fix in the upcoming LDM 6.13.12, this can be closed.

semmerson commented 4 years ago

Because it is a bug in the NOAAPort ingestion code (a program should only crash due to a hardware fault) I'd rather keep it open -- as a reminder if nothing else.

Regards, Steve Emmerson

On Sun, Jan 19, 2020 at 8:24 PM Gilbert Sebenste notifications@github.com wrote:

Due to the bug fix in the upcoming LDM 6.13.12, this can be closed.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/Unidata/LDM/issues/71?email_source=notifications&email_token=AAEVZ7PBOYFISRLINAWDDZ3Q6UKOLA5CNFSM4I3JX4YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJLHGQI#issuecomment-576090945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEVZ7MMNNZF3A5OUIFWUSLQ6UKOLANCNFSM4I3JX4YA .