dsrvlabs / vatz

Node management integration tools in purpose of maximizing node's uptime for any protocols
https://www.dsrvlabs.com/
GNU General Public License v3.0
30 stars 15 forks source link

Enhance Notification behaviors per status. #174

Closed xellos00 closed 2 years ago

xellos00 commented 2 years ago

Checklist


Is your feature request related to a problem? Please describe.

A clear and concise description of what the problem is. such as Ex. I'm always frustrated

This issue only enhancement for current behaviors that mentioned at https://github.com/dsrvlabs/vatz/issues/163#issuecomment-1176344702

image

Describe the solution you'd like| Ex

A clear and concise description of what you want to happen

State Severity Send Notification Condition
!SUCCESS CIRICAL Y
!SUCCESS ERROR Y
SUCCESS --- anything that send recover

State Severity Send Notification Condition
SUCCESS INFO Y anything that has recovered from Alert (1 time)
FAIL CRITICAL Y Fail to get appropriate metric value and it's critical (e.g 95% of disk usage)
FAIL WARNING Y Fail to get appropriate metric value and it's warning (e.g 70% of disk usage)
ERROR CRITICAL Y Get error to get metric Info (so any error that gets from plugins has to send response with this)
xellos00 commented 2 years ago

@dsrvlabs/validator is there any other suggestion table above?

gnongs commented 2 years ago

@dsrvlabs/validator is there any other suggestion table above?

If 95% of disk usage, Isn't state SUCCESS and severity CRITICAL ?

rootwarp commented 2 years ago

I think meaning of Error and Critical seems little bit confused because I don't know which one is more serious level.

How about

heejin-github commented 2 years ago

May I ask about Failed condition of state? It means failed to get appropriate metric value. but, how do we know that it is Critical or Warning? even if we didn't get the value. I think that if the state is Failed, we couldn't identifying the severity.

xellos00 commented 2 years ago

@Choi-Jinhong @heejin-github @rootwarp Thanks for patients, I almost forgot to reply for this. Let's summarize alert behaviors in table.

xellos00 commented 2 years ago
enum STATE {
    NONE = 0;
    PENDING = 1;
    IN_PROGRESS = 2;
    SUCCESS = 3;
    FAILURE = 4;
    TIMEOUT = 5;
}

enum SEVERITY {
    UNKNOWN = 0;
    WARNING = 1;
    ERROR = 2;
    CRITICAL = 3;
    INFO = 4;
}
  1. flags(State + Severity) 조합을 기준으로 노티를 받는 상태값 정리
  2. Y Conditional notification이 있는데, 그것에 대한 명확한 Condition을 정하기 위함.
  3. Send recovered message when State & Severity is SUCCESS | INFO and stop reminder.
State Severity Send Notification (Y/N) Recovery (Y/N)
SUCCESS WARNING Y Y
SUCCESS CRITICAL Y Y
SUCCESS INFO Y N
FAILURE * Y Y
ANY OTHER * N N/A

Condition Send a single time notification when any flag(state+severity) has changed from previous flag

  1. SUCCESS

    • SUCCESS | WARNING |
    • SUCCESS | CRITICAL |
    • SUCCESS | INFO |
  2. FAILURE

    • FAILURE | * |