Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2k stars 574 forks source link

Satellite forward clock skew cut out all the checks from the icinga2 master #9790

Open dercol1 opened 1 year ago

dercol1 commented 1 year ago

I have a packaged icinga2 2.11.9 version in a product that is called neteye (4.28). I have a configuration where a master icinga2 installation (with icinga2-web) talk with remote satellites. One satellite is running in a oVirt KVM farm, caused by the low speed of the disks, sometimes the linux kernel of the satellite register a "CPU stuck" and sometimes the internal clock begin to be one day ahead the current date. This erroneous information is then forwarded to the icinga2 master server this make the next checks regarding the satellite completely stuck and not performed even asking to check.

The solution we found is to stop the icinga2-master server on the master node, remove the /neteye/shared/icinga2/data/lib/icinga2/icinga2.state and then restarting the icinga2-master service (that is the owner of the icinga2 processes). But this solution simply trow away all the configuration. Please give me some hint to address the problem. I know only partially the architecture

Al2Klimov commented 8 months ago

sometimes the internal clock begin to be one day ahead the current date. This erroneous information is then forwarded to the icinga2 master server this make the next checks regarding the satellite completely stuck and not performed even asking to check.

"completely stuck" or stuck for one day (the period the clock was ahead)?

dercol1 commented 5 months ago

You are right, the "stuck" time stop after the clock skew is recovered. Sorry for the delay in the answer..... I was notified only today. Thank you

Al2Klimov commented 4 months ago

Does check now help?