NCAR / ucomp-pipeline

Data processing pipeline for UCoMP
Other
6 stars 3 forks source link

Devise a system to alert observers when UCoMP filter temperature is out of range #19

Open jburkepile opened 3 years ago

jburkepile commented 3 years ago

Generate an alert (e.g. a single email) that notifies observer when the UCoMP filter temperature is out of nominal range. Email should also go to Boulder team (Giuliana, Mike, Steve, Joan). Can a flag be set that limits the alert to a single email rather than receiving multiple emails per day?

I would have included Steve in this ticket but I cannot find his name in the github system.

mgalloy commented 3 years ago

The main use of alerts is to alert observers during near-realtime processing of the data. In the near-realtime phase of processing, the pipeline is launched many times on a given cadence, i.e., every minute, every 10 minutes, etc. Separate launches of the realtime pipeline must communicate amongst themselves of which alerts have already been sent and when they were sent. The type of alert already sent is not enough. For example, an alert about a unknown FITS keyword value might send an alert. The system can't quiet all unknown FITS keyword value alerts because the fix might cause another error that is slightly different. It might also not fix the problem, so date/time along with someplace to put a preference for how long to wait before sending another alert for the problem is needed.

My current plan to communicate between launches of the pipeline is to create an alerts log file that remembers the alerts sent. It will have lines of the following format:

YYYYMMDD.HHMMSS ALERT_NAME ALERT_SHA1_CODE

For example:

20210614.120437 BAD_FITS_KEYWORD 0071cda22b4fc378d0f189710905b8d4f63dbd15

Each alert type would have a config file preference to specify a timeout value for sending another alert:

# Alerts are a type of notification for near real-time processing that provide
# feedback to observers and other MLSO staff. The alerts are listed by name
# below with the time [minutes] before another alert of that type with the same
# content should be sent again (0 for no delay, a negative value for never
# send again).
[alerts]
bad_fits_keyword   : type=int, optional=YES, default=0
mgalloy commented 3 years ago

This is implemented with the ucomp_run::can_send_alert method, but is not currently used because there is no realtime pipeline yet.

bberkeyU commented 1 year ago

Email may not be seen in real-time. If problems can be detected from real-time data (such as temps out of range), it would be better to build these alerts into LabView and show the alerts directly to the observers where they are already looking.

We should consider closing this issue.