Icinga / icinga-notifications

Icinga Notifications: new and improved notifications and incident management for Icinga (work in progress, not ready for production yet)
GNU General Public License v2.0
10 stars 0 forks source link

Icinga 2 source does not work if any acknowledged checkable misses the corresponding comment #245

Closed julianbrost closed 4 months ago

julianbrost commented 4 months ago

When any Icinga 2 host or service is acknowledged, but the corresponding comment went missing, the following happens and the Icinga 2 Event Stream Source fails to properly start working:

2024-07-19T11:28:27.409Z    INFO    icinga2 Start listening on Icinga 2 Event Stream    {"source_id": 1}
2024-07-19T11:28:27.409Z    INFO    icinga2 Worker enters catch-up-phase, start caching up on Event Stream events   {"source_id": 1}
2024-07-19T11:28:32.550Z    WARN    icinga2 Catch-up-phase was interrupted by an error, another attempt will be made    {"source_id": 1, "error": "fetching acknowledgement comment for \"master-1!icinga-cluster\" failed, found no ACK Comments for \"comment.entry_type == 4 && comment.host_name == comment_host_name && comment.service_name == comment_service_name\" with map[comment_host_name:master-1 comment_service_name:icinga-cluster]", "delay": "1s"}
2024-07-19T11:28:32.550Z    INFO    icinga2 Worker enters catch-up-phase, start caching up on Event Stream events   {"source_id": 1}
2024-07-19T11:28:35.921Z    WARN    icinga2 Catch-up-phase was interrupted by an error, another attempt will be made    {"source_id": 1, "error": "fetching acknowledgement comment for \"master-1!icinga-cluster\" failed, found no ACK Comments for \"comment.entry_type == 4 && comment.host_name == comment_host_name && comment.service_name == comment_service_name\" with map[comment_host_name:master-1 comment_service_name:icinga-cluster]", "delay": "2s"}
2024-07-19T11:28:35.921Z    INFO    icinga2 Worker enters catch-up-phase, start caching up on Event Stream events   {"source_id": 1}
2024-07-19T11:28:40.419Z    WARN    icinga2 Catch-up-phase was interrupted by an error, another attempt will be made    {"source_id": 1, "error": "fetching acknowledgement comment for \"master-1!icinga-cluster\" failed, found no ACK Comments for \"comment.entry_type == 4 && comment.host_name == comment_host_name && comment.service_name == comment_service_name\" with map[comment_host_name:master-1 comment_service_name:icinga-cluster]", "delay": "4s"}
2024-07-19T11:28:40.419Z    INFO    icinga2 Worker enters catch-up-phase, start caching up on Event Stream events   {"source_id": 1}
2024-07-19T11:28:46.673Z    WARN    icinga2 Catch-up-phase was interrupted by an error, another attempt will be made    {"source_id": 1, "error": "fetching acknowledgement comment for \"master-1!icinga-cluster\" failed, found no ACK Comments for \"comment.entry_type == 4 && comment.host_name == comment_host_name && comment.service_name == comment_service_name\" with map[comment_host_name:master-1 comment_service_name:icinga-cluster]", "delay": "8s"}
2024-07-19T11:28:46.673Z    INFO    icinga2 Worker enters catch-up-phase, start caching up on Event Stream events   {"source_id": 1}
2024-07-19T11:28:57.109Z    WARN    icinga2 Catch-up-phase was interrupted by an error, another attempt will be made    {"source_id": 1, "error": "fetching acknowledgement comment for \"master-1!icinga-cluster\" failed, found no ACK Comments for \"comment.entry_type == 4 && comment.host_name == comment_host_name && comment.service_name == comment_service_name\" with map[comment_host_name:master-1 comment_service_name:icinga-cluster]", "delay": "16s"}
2024-07-19T11:28:57.109Z    INFO    icinga2 Worker enters catch-up-phase, start caching up on Event Stream events   {"source_id": 1}
2024-07-19T11:29:15.270Z    WARN    icinga2 Catch-up-phase was interrupted by an error, another attempt will be made    {"source_id": 1, "error": "fetching acknowledgement comment for \"master-1!icinga-cluster\" failed, found no ACK Comments for \"comment.entry_type == 4 && comment.host_name == comment_host_name && comment.service_name == comment_service_name\" with map[comment_host_name:master-1 comment_service_name:icinga-cluster]", "delay": "32s"}
2024-07-19T11:29:15.270Z    INFO    icinga2 Worker enters catch-up-phase, start caching up on Event Stream events   {"source_id": 1}
2024-07-19T11:29:49.188Z    WARN    icinga2 Catch-up-phase was interrupted by an error, another attempt will be made    {"source_id": 1, "error": "fetching acknowledgement comment for \"master-1!icinga-cluster\" failed, found no ACK Comments for \"comment.entry_type == 4 && comment.host_name == comment_host_name && comment.service_name == comment_service_name\" with map[comment_host_name:master-1 comment_service_name:icinga-cluster]", "delay": "1m4s"}
2024-07-19T11:29:49.188Z    INFO    icinga2 Worker enters catch-up-phase, start caching up on Event Stream events   {"source_id": 1}
2024-07-19T11:30:56.037Z    WARN    icinga2 Catch-up-phase was interrupted by an error, another attempt will be made    {"source_id": 1, "error": "fetching acknowledgement comment for \"master-1!icinga-cluster\" failed, found no ACK Comments for \"comment.entry_type == 4 && comment.host_name == comment_host_name && comment.service_name == comment_service_name\" with map[comment_host_name:master-1 comment_service_name:icinga-cluster]", "delay": "2m8s"}
2024-07-19T11:30:56.037Z    INFO    icinga2 Worker enters catch-up-phase, start caching up on Event Stream events   {"source_id": 1}
2024-07-19T11:33:06.841Z    WARN    icinga2 Catch-up-phase was interrupted by an error, another attempt will be made    {"source_id": 1, "error": "fetching acknowledgement comment for \"master-1!icinga-cluster\" failed, found no ACK Comments for \"comment.entry_type == 4 && comment.host_name == comment_host_name && comment.service_name == comment_service_name\" with map[comment_host_name:master-1 comment_service_name:icinga-cluster]", "delay": "3m0s"}
2024-07-19T11:33:06.841Z    INFO    icinga2 Worker enters catch-up-phase, start caching up on Event Stream events   {"source_id": 1}

This Situation can easily be created manually by acknowledging something and then deleting the corresponding comment from Icinga Web (this currently does not clear the acknowledgment, see also https://github.com/Icinga/icinga2/issues/8896), however, I'm quite certain that I didn't do this, so maybe there's even an "Icinga 2 forgets ack comments" bug here as well.

Anyways, the current error handling for this situation is too aggressive, this should not make the whole catch-up phase fail.