JeffersonLab / jaws

An alarm system built on Kafka that supports pluggable sources
https://ace.jlab.org/jaws
MIT License
7 stars 0 forks source link

Flood Control: ability to suppress alarms #11

Open slominskir opened 3 years ago

slominskir commented 3 years ago

It might be useful to handle alarm floods via configured alarm suppression relationships. An alarm flood occurs when an overwhelming number of alarms are triggered, but a smaller number of alarms could convey the same state (alarms sometimes are dependent on or implied by others). Managing alarm floods is important to keep the operator's situational awareness intact.

When an alarm is registered we could add an optional field named "suppressed by" to each registered alarm such that an alarm can optionally be suppressed by another alarm (higher up the logical hierarchy). For example, a component inside a rack could be "suppressed by" the rack itself. This could be recursive to form a tree, if desired. For example, the rack could be suppressed by a room alarm. The suppression info would be informational only (all alarms would still be alarming) - the client (GUI) could simply use the suppression info to collapse suppressed alarms under a parent alarm (or some other visual cue could be used to indicate the alarm is "trumped" by another). This mimics how Exceptions are often handled in programing languages. You see the high level alarm prominently, but can optionally drill down into the details if desired.

slominskir commented 3 years ago

Note: A CALC alarm could be used to handle flood control too. See: https://github.com/JeffersonLab/kafka-alarm-system/issues/8. A CALC alarm is a more complex and round about way though - you'd need to define a conditional alarm for every alarm prone to join floods and remove it from standard monitoring (epics2kafka) and instead register it as a CALC alarm that only alarms if it's parent alarm is not already alarming.

slominskir commented 3 years ago

Its also worth acknowledging that the source of alarms could handle flood control too - In EPICS for example an IOC could be coded to not alarm if there is already a more prominent alarm active.

theojlab commented 3 years ago

The "suppressed by" has a conceptual simplicity that I like. A nice feature would be that the hierarchy can be viewed in a GUI configuration tool and therefore the relations be made obvious to users. If the logic is embedded in IOC code or even in CALC records, it would be harder or even impossible to give the same high level overview to the alarms manager.

slominskir commented 3 months ago

We have the MaskedOverride and maskedby attribute for this, but we haven't implemented the logic to handle it in the effective-processor yet: https://github.com/JeffersonLab/jaws-effective-processor/issues/2