Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2.03k stars 578 forks source link

Downtime lost after restart #5625

Closed cite closed 7 years ago

cite commented 7 years ago

After restarting Icinga 2, downtimes for services are lost.

JSON output (/v1/objects/services?service=muc1pro-ite-1!child-health) for service muc1pro-ite-1!child-health before restart:

{
  "results": [
    {
      "attrs": {
        "__name": "muc1pro-ite-1!child-health",
[...]
        "display_name": "child-health",
       "downtime_depth": 1,
[...]

And after restarting (even WITHOUT any configuration change):

{
  "results": [
    {
      "attrs": {
        "__name": "muc1pro-ite-1!child-health",
    [...]
        "display_name": "child-health",
        "downtime_depth": 0,

What other data would you need me to provide?

Your Environment

dnsmichi commented 7 years ago

Verify the package structure as mentioned in https://github.com/Icinga/icinga2/issues/3668#issuecomment-282549005 and post your findings here please.

cite commented 7 years ago

Performing the steps mentioned in your link fixed the problem for the primary configuration master - thanks a lot. Our installation was missing any files in /var/lib/icinga2/api/packages/_api, and also had an extraneous conf.d directory containing comments and downtimes. This is now fixed on the primary configuration master, and the downtimes are visible again.

What would be the easiest way to fix this across the other members of the master zone, our 22 satellites and all clients (it it needs fixing), and how do we prevent this from happening again?

EDIT: As for the first question, I just realized that was a dumb thing to ask: Delete folder, restart Icinga 2 on satellites.

dnsmichi commented 7 years ago

This should be addressed by #5620 which ensures that the activestage name is always set and a package creation is atomoc. In terms of fixing the package - deleting it on the secondary master/satellites should be sufficient. Or you'll manually rsync the stage content, if the sync takes too long.

cite commented 7 years ago

Ok, thank you for your help. Closing this issue.