NCAR / WVD-MCSupdate

Addition of NCAR MCS, rewrite to software infrastructure, and expansion of features for WVD operations.
2 stars 1 forks source link

Set up Error/Warning Alerts #182

Closed BradSchoenrock closed 6 years ago

BradSchoenrock commented 6 years ago

Set up Scripts on eldora to alert us for error/warnings and if files are not getting sent.

BradSchoenrock commented 6 years ago

This is something we can work on right after the software on the DIAL unit is locked, but needs to be dealt with before it ships and is worth keeping in mind.

BradSchoenrock commented 6 years ago

Two things to note here that i worked on today:

1) the rsync for warning and error directories was failing because the rsync scripts on eldora were expecting a particular structure of Warning/YYYYMMDD and i was pushing warning and error files into the Warning directory loosely, so i added the structure to make the scripts work for those two cases.

2) this one is ongoing but there is a limit to the number of RSync connections eldora is allowed to have at any time. When i expanded the script to rsync the NetCDFOutput directory, Warning directory, and Error directories for DIAL2 and 3 that expanded the number of RSync connections beyond the limit of 2. The limit is there to prevent overloading eldora but that means we can't RSync from all of the dial units at the same time.

The sporadic email alerts we are seeing from eldora are caused by number 2. Sometimes the transfer from DIAL1 ends quickly enough for the connection to be established for DIAL2 without issue. Other times, well, not so much. We will have to discuss this further.

BradSchoenrock commented 6 years ago

The exact error message reads:

@ERROR: max connections (2) reached -- try again later

stillwer commented 6 years ago

The main issue with DIAL02 was that its IP address seemed to be fighting the network. Instead of insisting on an address, the network was allowed to pick the address and the transfers are much faster now.