Icinga / icingaweb2-module-director

The Director aims to be your new favourite Icinga config deployment tool. Director is designed for those who want to automate their configuration deployment and those who want to grant their “point & click” users easy access to the configuration.
https://icinga.com/docs/director/latest
GNU General Public License v2.0
413 stars 203 forks source link

Director daemon doesn't act on DB connection loss #2909

Open log1-c opened 2 months ago

log1-c commented 2 months ago

Not sure about the title, but here is what happened in our setup.

Setup: Two webservers running the Director daemon as a systemd service. One server is in a public cloud, one is in a private cloud. Connection to the database in handled via HAproxy to the three galera-cluster nodes. The webinterface is behind a loadbalancer.

Normally the primary instances for the webinterface (and thus the daemon) is the private cloud side.

Now there was a VPN connection issue leading to a connection loss for the private cloud side. Icinga2 switched to the public cloud, icingaweb2 (the loadbalancer) switched to the public cloud, and with it the Director daemon.

But according to journalctl -u icinga-director it still lost connection to the MySQL cluster. Many MySQL server has gone away messages from our import & sync jobs.

journalctl -u icinga-director from private cloud host.txt journalctl -u icinga-director from public cloud host.txt

But the systemctl status icinga-director output still says running, db: connected

icinga-director.service - Icinga Director - Monitoring Configuration
   Loaded: loaded (/etc/systemd/system/icinga-director.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2024-08-11 20:33:54 CEST; 2 days ago
     Docs: https://icinga.com/docs/director/latest/
 Main PID: 4020 (icingacli)
   Status: "running, db: connected"
    Tasks: 2 (limit: 24881)
   Memory: 125.2M
   CGroup: /system.slice/icinga-director.service
           ├─  4020 icinga::director: running, db: connected
           └─463212 icinga::director::job (Import all Sources)

The issue is: Our import & sync jobs aren't running leading to a discrepancy between the monitored infrastructure and the monitoring view.

What I would have expected (one of those):

System: OS: rhel8 Director Version 1.11.1

lippserd commented 2 weeks ago

@log1-c thanks for the issue and the logs. This does indeed look strange. Investigating will take some time though.