Closed jkroepke closed 2 years ago
How about the parameters you pass to such a downtime - start/end time, etc. Best would be a curl request against the REST API to easily reproduce the issue.
Okay, here are the curl requests:
Host is online:
curl -H "Accept: application/json" -k -s -u root -X POST -d '{ "type": "Host", "filter": "host.name==\"batman\"", "start_time": '$(date +%s)', "end_time": '$(date +%s --date="+30 seconds")', "author": "root", "comment": "test", "fixed": true, "duration": 30 }' -k "https://localhost:5665/v1/actions/schedule-downtime"
{"results":[{"code":200.0,"legacy_id":47.0,"name":"batman!icinga-1493796888-10","status":"Successfully scheduled downtime 'batman!icinga-1493796888-10' for object 'batman'."}]}
curl -H "Accept: application/json" -k -s -u root -X POST 'https://localhost:5665/v1/events?queue=debugnotifications&types=Notification'
......
{"author":"root","check_result":{"active":true,"check_source":"icinga","command":["/usr/local/monitoring/libexec/default/check_ping","-H","10.204.7.69","-c","5000,100%","-w","3000,80%"],"execution_end":1493796798.4722080231,"execution_start":1493796794.4702019691,"exit_status":0.0,"output":"PING OK - Packet loss = 0%, RTA = 0.42 ms","performance_data":null,"schedule_end":1493796798.4722321033,"schedule_start":1493796794.4699997902,"state":0.0,"type":"CheckResult","vars_after":{"attempt":1.0,"reachable":true,"state":0.0,"state_type":1.0},"vars_before":{"attempt":1.0,"reachable":true,"state":0.0,"state_type":1.0}},"host":"batman","notification_type":"DOWNTIMESTART","text":"test","timestamp":1493796892.8843939304,"type":"Notification","users":["Team_Middleware"]}
Host is offline (Soft 1/3):
curl -H "Accept: application/json" -k -s -u root -X POST -d '{ "type": "Host", "filter": "host.name==\"batman\"", "start_time": '$(date +%s)', "end_time": '$(date +%s --date="+30 seconds")', "author": "root", "comment": "test", "fixed": true, "duration": 30 }' -k "https://localhost:5665/v1/actions/schedule-downtime"
{"results":[{"code":200.0,"legacy_id":48.0,"name":"batman!icinga-1493797061-11","status":"Successfully scheduled downtime 'batman!icinga-1493797061-11' for object 'batman'."}]}
curl -H "Accept: application/json" -k -s -u root -X POST 'https://localhost:5665/v1/events?queue=debugnotifications&types=Notification'
......
Host is offline (Hard 3/3):
curl -H "Accept: application/json" -k -s -u root -X POST -d '{ "type": "Host", "filter": "host.name==\"batman\"", "start_time": '$(date +%s)', "end_time": '$(date +%s --date="+30 seconds")', "author": "root", "comment": "test", "fixed": true, "duration": 30 }' -k "https://localhost:5665/v1/actions/schedule-downtime"
{"results":[{"code":200.0,"legacy_id":49.0,"name":"batman!icinga-1493797144-12","status":"Successfully scheduled downtime 'batman!icinga-1493797144-12' for object 'batman'."}]}
curl -H "Accept: application/json" -k -s -u root -X POST 'https://localhost:5665/v1/events?queue=debugnotifications&types=Notification'
......
Hm, I have an idea about the hosts raw state which influences the downtime trigger in lib/icinga/downtime.cpp:137
. Can you extract the attribute last_check_result
for the affected host via /v1/objects/hosts for all three tests of yours? I would believe that your host contains last_check_result.state
which is set to 1
and not 0
.
hm. It's 2.
curl -H "Accept: application/json" -k -s -u root "https://localhost:5665/v1/objects/hosts/batman" | python -m json.tool
{
"results": [
{
"attrs": {
"__name": "batman",
...
"last_check_result": {
"active": false,
"check_source": "icinga",
"command": null,
"execution_end": 1493812075.0,
"execution_start": 1493812075.0,
"exit_status": 0.0,
"output": "DOWN",
"performance_data": [],
"schedule_end": 1493812075.0,
"schedule_start": 1493812075.0,
"state": 2.0,
"type": "CheckResult",
"vars_after": {
"attempt": 1.0,
"reachable": true,
"state": 2.0,
"state_type": 0.0
},
"vars_before": {
"attempt": 1.0,
"reachable": true,
"state": 0.0,
"state_type": 1.0
}
},
"last_hard_state": 0.0,
"last_hard_state_change": 1493797254.984923,
"last_reachable": true,
...
}
}
]
}
It's does not matter, if the check executed active by icinga or send the check result passively via API/icingaweb2.
@dnsmichi Any news? Do you need more informations?
I have a possible fix in my stash, but I did not yet reproduce the issue (working on other issues atm).
diff --git a/lib/icinga/downtime.cpp b/lib/icinga/downtime.cpp
index 909ba7e8f..056e78cdf 100644
--- a/lib/icinga/downtime.cpp
+++ b/lib/icinga/downtime.cpp
@@ -134,7 +134,7 @@ void Downtime::Start(bool runtimeCreated)
* this downtime now *after* it has been added (important
* for DB IDO, etc.)
*/
- if (checkable->GetStateRaw() != ServiceOK) {
+ if (!checkable->IsStateOK(checkable->GetStateRaw()) {
Log(LogNotice, "Downtime")
<< "Checkable '" << checkable->GetName() << "' already in a NOT-OK state."
<< " Triggering downtime now.";
We have this problem only w/ fix downtimes. Flexible downtimes are fine.
Thanks, that helps reproducing it.
@dnsmichi This problem still exists in 2.7.0-r1
@dnsmichi same behavior for services which are current not in OK State. (Icinga2 version: r2.7.1-1)
r2.8.2-1 - same problem for services. If a service in failed state - putting it to downtime does not trigger a notification
Same here with version r2.13.1
If we create a fixed downtime for a host which is already in a DOWN state, icinga2 does not generate a DOWNTIMESTART notification.
Flexible Downtimes sends DOWNTIMESTART notification.
Expected Behavior
All Downtimes send a DOWNTIMESTART notification
Current Behavior
See description.
Possible Solution
Steps to Reproduce (for bugs)
Context
We have a inhouse SLA reporting and incident tool inhouse. All actions from the icinga are transfered via notifications to the reporting tool.
Your Environment
icinga2 --version
): 2.6.3icinga2 feature list
): api checker command debuglog ido-mysql livestatus mainlog notification perfdata syslogicinga2 daemon -C
): Is fine.zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes.