Closed milan-koudelka closed 3 years ago
Hm.. we could make now the default last check time. But passive checks are enabled by default, so we can't tell easily and reliably whether a service is actually checked passively. If we'd change the default for all services with passive checks enabled, we'd also delay the active checks, so effectively all new services would be pending for – say – 5m.
@lippserd Could everyone live w/ the latter?
@Al2Klimov Do I understand you correctly, that you would change the behavior that the first check would be run after check_interval and it doesn't care if it is active or passive?
Yes, that’s what I suggested.
I think that it would help even the performance of Icinga. When I deploy a lot of new hosts, all checks are queued immediately, even that some of them I would like to check hourly/daily (eg. certificate expiration). But I can understand, that it would painful to find out that you have an invalid certificate on your new host after a day in a production environment :-D
Hm.. we could make now the default last check time. But passive checks are enabled by default, so we can't tell easily and reliably whether a service is actually checked passively. If we'd change the default for all services with passive checks enabled, we'd also delay the active checks, so effectively all new services would be pending for – say – 5m.
@lippserd Could everyone live w/ the latter?
I don't think this is an option due to the different check intervals. The first freshness check should only be triggered when the check interval has been exceeded. Maybe we need a creation_time
attribute for that - we don't have it, right?
Right.
Do you consider a such workaround reasonable? https://community.icinga.com/t/downtime-for-new-hosts/3819/3?u=al2klimov
I read that thread about creationtime and downtime to all hosts. It is a nice solution. I'm afraid if it will truly work at all and even more for this case.
1/ We used a similar approach, after deploying a new host we set the downtime for 1 hour through API immediately for a host and all its services. It was faulty. Sometimes because it took some time to process the request we tried to set downtime which had start time in the past and Icinga had problems which such settings. We switched to the solution where we disable notifications and we enable them after one hour. Your solution is better, it is just small code, no API calls probably can be tuned to set downtime also to all services. However, will be Icinga happy if the creation time will be in past? Will it work then? I'm not sure if this downtime in past will be applied also for older hosts that already exist in the configuration. That would mean a lot of downtime objects in Icinga2 which can lead to worse performance.
2/ For this case, we would have to set downtime for passive check services for interval based on freshness check interval. I'm not sure if I can do that with some code similar to the one you mentioned in the thread.
For context, I copy-paste the code I mentioned in this comment below.
object Host "example.com" {
vars.created_at = 1234567890
}
apply Downtime "pre-prod" to Host {
assign where true
start_time = host.vars.created_at
end_time = start_time + 1h * 24 * 7
}
apply Downtime "pre-prod" to Service
?Ok, I could do something like this.
object Host "example.com" {
vars.created_at = 1234567890
}
object Downtime "pre-production-downtime" to Service {
author = "icingaadmin"
comment = "Scheduled downtime for new passive checks until first freshness check interval expire"
start_time = host.vars.created_at
end_time = start_time + service.check_interval
assign where true
}
Or maybe I don't need to care about created_at and check_inteval at all. All new checks have service.last_check set to 1970-01-01. I can probably disable active checks for these checks completely until at least one check result is reported. It can be dangerous if the check doesn't work from the beginning at all.
template Service "passive-service" {
check_command = "passive"
enable_passive_checks = 1
#disable notifications in case host has notifications disabled also
if (host.enable_notifications){
enable_notifications = true
} else {
enable_notifications = false
}
/* Use a runtime function to retrieve the last check time and more details. */
vars.dummy_text = {{
var service = get_service(macro("$host.display_name$"), macro("$service.name$"))
var lastCheck = DateTime(service.last_check).to_string()
return "No check results received. Last result time: " + lastCheck
}}
vars.dummy_state = 3
vars.enable_pagerduty = true
if (vars.service.last_check < 1){
enable_active_checks = 1
}else{
enable_active_checks = 0
}
check_interval = 30m
}
Hms, no the second option doesn't work vars.service.last_check is not defined as I thought would be.
Let us know once you’ve found a reasonable workaround.
And yet another suggestion – let the first freshness check be OK:
template Host "passive" {
check_command = "dummy"
var that = this
vars.dummy_state = function() use(that) {
return if (that.last_check_result) { 3 } else { 0 }
}
}
@Al2Klimov This is super cool. That is probably what I was looking for. The first immediate dummy freshness check returns OK and then it works as usual. Thank you!
Describe the bug
We are using passive checks with freshness interval. Unfortunately, when we deploy new host with passive check. The check is switch to HARD failed state almost immediately. It should be created with service.last_check time set to current time and wait until check_interval time expire for newly deployed service.
Please ensure to read https://github.com/Icinga/icinga2/blob/master/doc/15-troubleshooting.md first. Formatting tips: GitHub supports Markdown: https://guides.github.com/features/mastering-markdown/
To Reproduce
Provide a link to a live example, or an unambiguous set of steps to reproduce this bug. Include configuration, logs, etc. to reproduce, if relevant.
Expected behavior
When I deploy new host, I don't expect that I will receive all passive check results in a minute. Passive checks run usually in longer interval (eg. backup task). I expect that passive check state will be somehow initiated at creation.
Screenshots
If applicable, add screenshots to help explain your problem.
Your Environment
Include as many relevant details about the environment you experienced the problem in
icinga2 --version
): 2.12.0-1icinga2 feature list
):Disabled features: compatlog debuglog elasticsearch gelf graphite icingadb influxdb livestatus opentsdb perfdata statusdata syslog Enabled features: api checker command ido-mysql mainlog notification
icinga2 daemon -C
):zones.conf
file (oricinga2 object list --type Endpoint
andicinga2 object list --type Zone
) from all affected nodes.Additional context
template Service "passive-service" {
}
object Host "stg3-connectors-vertica01.XXX" { import "develop-host" display_name = "stg3-connectors-vertica01.XXX" address = "XXX" vars.gdc_services = [ "vertica-backup" ] enable_notifications = true }
apply Service "vertica backup" { import "passive-service" display_name = "Vertica backup" check_interval = 29h max_check_attempts = 1 assign where "vertica-backup" in host.vars.gdc_services }