Icinga / icingaweb2-module-vspheredb

The easiest way to monitor a VMware vSphere environment.
https://icinga.com/docs/vsphere/latest
GNU General Public License v2.0
100 stars 34 forks source link

Problem with unix domain socket #444

Closed tolecnal closed 2 years ago

tolecnal commented 2 years ago

Description After installing the latest version of the vsphere module (ref: v1.1.1), I am able to complete the database setup and initial installation, and add a vsphere host which is able to retrieve information about the deployed hosts.

However when I go into vSphere Daemon Status it complains that it is unable to access the Unix Domain Socket. I have verified that the user running the module through icingacli has permissions on the socket file descriptor, and even gave it global read/write permissions (ref: 0666).

Even with global permissions, the daemon status page states the following:

Unable to connect to unix domain socket "unix:///run/icinga-vspheredb/vspheredb.sock": Connection refused (ECONNREFUSED)

If I look at the logs, in this case syslog I also see the following:

Oct 11 06:50:33 icinga systemd[1]: Starting Icinga vSphereDB Daemon...
Oct 11 06:50:33 icinga systemd[1]: Started Icinga vSphereDB Daemon.
Oct 11 06:50:33 icinga icinga-vspheredb[2478]: [configwatch] DB configuration loaded
Oct 11 06:50:33 icinga icinga-vspheredb[2478]: [db] sending DB config to child process
Oct 11 06:50:33 icinga icinga-vspheredb[2478]: [db] Running DB cleanup (this could take some time)
Oct 11 06:50:33 icinga icinga-vspheredb[2478]: [db] DB has been cleaned up
Oct 11 06:50:33 icinga icinga-vspheredb[2478]: [localdb] ready
Oct 11 06:50:33 icinga icinga-vspheredb[2478]: [api] launching server 1: vCenterId=1: https://username@some.vspherehost.example.com
Oct 11 06:50:33 icinga icingacli[2478]: ERROR: RuntimeException in /usr/share/icingaweb2/modules/incubator/vendor/gipfl/socket/src/UnixSocketInspection.php:64 with message: Got no proc dir (/proc/1504) for remote node
Oct 11 06:50:33 icinga systemd[1]: icinga-vspheredb.service: Main process exited, code=exited, status=1/FAILURE
Oct 11 06:50:33 icinga mariadbd[1156]: 2022-10-11  6:50:33 103 [Warning] Aborted connection 103 to db: 'vspheredb' user: 'vspheredb' host: 'localhost' (Got an error reading communication packets)
Oct 11 06:50:33 icinga systemd[1]: icinga-vspheredb.service: Failed with result 'exit-code'.

It seems to be an issue with handling of the unix domain socket through the IPL module, however I can't make out what the issue is, not knowing the IPL module.

Tried with both nginx and Apache as the front end.

System information: Ubuntu 22.04.1 LTS PHP 8.1.2 MariaDB 10.6.7-2ubuntu1.1 Icinga2 2.13.5-1.jammy Icingaweb2 2.11.1-1.jammy

Thomas-Gelf commented 2 years ago

Log lines and error messages are from the vspheredb module, not vsphere - but vspheredb has no version 1.1.1. Could you please re-check your versions, and also let me know the version ov your ìncubator module?

The web error error (Unable to connect to unix domain socket) on the daemon page is telling us, that the daemon isn't running - which is confirmed by your log lines. They say: Got no proc dir (/proc/1504) for remote node. Is there SElinux or anything similar active? Is the web UI running on the very same host/container, or do you somehow connect from the outside of a dedicated container?

The following mariadb error suggests, that it terminated in an unclean way - which shouldn't happen. But let's address those errors step by step, we'll track this down.

tolecnal commented 2 years ago

Ah yes, I guess I was a bit cross eyed - we are talking about the vspheredb module. And the currently installed version is 1.4.0.

As for incubator this is running version 0,18.0, and director is running 1.10.0.

The server is set up with apparmor, which I thought might be interfering, but I disabled apparmor and tested without it running, but got the same errors. Everything is running on the same server, no containers. Icinga2 and icingaweb2 are installed using the official icinga repositories, and all modules have been installed using the git method.

Thomas-Gelf commented 2 years ago

Could you please give vSphereDB 1.5 and Incubator 1.9 a try? I see no issue related to what you have been showing in v1.4 - but the newer versions are better in handling some error conditions on PHP 8.1

tolecnal commented 2 years ago

Just upgraded both modules, and the unix daemon error persists. However I am now seeing a new error in regards to the database:

06:57:44: [configwatch] Sending DB Config failed: SQLSTATE[HY000] [2002] No such file or directory in /usr/share/icingaweb2/library/vendor/Zend/Db/Adapter/Pdo/Abstract.php(145)

tolecnal commented 2 years ago

I performed a apt reinstall icingaweb2 which fixed the error about Abstract.php. Strange that it would be missing.

The new logs from syslog looks like this:

Oct 12 05:45:51 icinga icinga-vspheredb[14292]: [configwatch] DB configuration loaded
Oct 12 05:45:51 icinga icinga-vspheredb[14292]: [db] sending DB config to child process
Oct 12 05:45:51 icinga icinga-vspheredb[14292]: [db] Running DB cleanup (this could take some time)
Oct 12 05:45:51 icinga icinga-vspheredb[14292]: [db] DB has been cleaned up
Oct 12 05:45:51 icinga icinga-vspheredb[14292]: [localdb] ready
Oct 12 05:45:51 icinga icinga-vspheredb[14292]: [api] launching server 1: vCenterId=1: https://username@some.vspherehost.example.com
Oct 12 05:45:51 icinga icinga-vspheredb[14292]: [api some.vspherehost.example.com (id=1)] Logged out
Oct 12 05:45:52 icinga icinga-vspheredb[14292]: [api pcc-some.vspherehost.example.com (id=1)] Cookies changed, storing new ones
Oct 12 05:46:16 icinga icingacli[14292]: ERROR: RuntimeException in /usr/share/icingaweb2/modules/incubator/vendor/gipfl/socket/src/UnixSocketInspection.php:64 with message: Got no proc dir (/proc/14077) for remote node
Oct 12 05:46:16 icinga systemd[1]: icinga-vspheredb.service: Main process exited, code=exited, status=1/FAILURE
Oct 12 05:46:16 icinga mariadbd[1166]: 2022-10-12  5:46:16 646 [Warning] Aborted connection 646 to db: 'vspheredb' user: 'vspheredb' host: 'localhost' (Got an error reading communication packets)
Oct 12 05:46:16 icinga systemd[1]: icinga-vspheredb.service: Failed with result 'exit-code'.
Oct 12 05:46:16 icinga systemd[1]: icinga-vspheredb.service: Consumed 2.313s CPU time.
Thomas-Gelf commented 2 years ago

There is clearly something wrong with accessing your /proc filesystem, at least that's what I'm able to read from this message, combined with your description (no SELinux/Apparmor). But please don't ask me, why this happens :D

To catch this error, please apply the following patch:

--- a/library/Vspheredb/Daemon/RemoteApi.php
+++ b/library/Vspheredb/Daemon/RemoteApi.php
@@ -100,7 +100,16 @@ class RemoteApi implements EventEmitterInterface
             $jsonRpc = new JsonRpcConnection(new StreamWrapper($connection));
             $jsonRpc->setLogger($this->logger);

-            $peer = UnixSocketInspection::getPeer($connection);
+            try {
+                $peer = UnixSocketInspection::getPeer($connection);
+            } catch (Exception $e) {
+                $jsonRpc->setHandler(new FailingPacketHandler(Error::forException($e)));
+                $this->loop->addTimer(3, function () use ($connection) {
+                    $connection->close();
+                });
+                return;
+            }
+
             if (!$this->isAllowed($peer)) {
                 $jsonRpc->setHandler(new FailingPacketHandler(new Error(Error::METHOD_NOT_FOUND, sprintf(
                     '%s is not allowed to control this socket',

This will not fix accessing your proc filesystem, but at lease the daemon will continue to run when the web tries to access it's socket.

tolecnal commented 2 years ago

Applied the patch, and that does indeed ensure that the daemon is kept running, even though it fails.

I decided to dig some further, as it is clear that something is blocking access to the /proc file system. I backtraced my steps, and remembered that I had applied some OS hardening. Looking into this OS hardening, I went through the steps it takes and one of the steps is adding the option hidpid=2 to /etc/fstab. This option limits access to the /proc file system. Further information can be found here: https://linux-audit.com/linux-system-hardening-adding-hidepid-to-proc/

I then decided to remove this option from /etc/fstab and rebooted the system. The module was now able to access the /proc file system, so one step closer to something else :)

However I now saw that it was not able to access the socket file, as it said it no longer existed. I thought that /etc/tmpfiles.d/icinga-vspheredb.conf was supposed to sort this out. So I manually created a new socket and confirmed with file that it was indeed a socket file, and not a regular file. Restarted vspheredb and monitored the log files, where we now see this:

Oct 12 07:23:26 icinga systemd[1]: Starting Icinga vSphereDB Daemon...
Oct 12 07:23:26 icinga systemd[1]: Started Icinga vSphereDB Daemon.
Oct 12 07:23:26 icinga icingacli[4319]: ERROR: ErrorException in /usr/share/php/Icinga/Application/ClassLoader.php:303 with message: require(/usr/share/icingaweb2/modules/vspheredb/library/Vspheredb/Daemon/RemoteApi.php): Failed to open stream: Permission denied
Oct 12 07:23:26 icinga systemd[1]: icinga-vspheredb.service: Main process exited, code=exited, status=1/FAILURE
Oct 12 07:23:26 icinga systemd[1]: icinga-vspheredb.service: Failed with result 'exit-code'.
Oct 12 07:23:57 icinga systemd[1]: icinga-vspheredb.service: Scheduled restart job, restart counter is at 33.
Oct 12 07:23:57 icinga systemd[1]: Stopped Icinga vSphereDB Daemon.
Oct 12 07:23:57 icinga systemd[1]: Starting Icinga vSphereDB Daemon...
Oct 12 07:23:57 icinga systemd[1]: Started Icinga vSphereDB Daemon.
Oct 12 07:23:57 icinga icingacli[4371]: ERROR: ErrorException in /usr/share/php/Icinga/Application/ClassLoader.php:303 with message: require(/usr/share/icingaweb2/modules/vspheredb/library/Vspheredb/Daemon/RemoteApi.php): Failed to open stream: Permission denied
Oct 12 07:23:57 icinga systemd[1]: icinga-vspheredb.service: Main process exited, code=exited, status=1/FAILURE
Oct 12 07:23:57 icinga systemd[1]: icinga-vspheredb.service: Failed with result 'exit-code'.

I have verified that the socket file has the same permissions as defined in /etc/tmpfiles.d/icinga-vspheredb and also tested with chmod 0777 without that making any difference.

tolecnal commented 2 years ago

... and as it turned out, when I used git apply <patch> it had reverted the permissions of the file vspheredb/library/Vspheredb/Daemon/RemoteApi.php to 0640 which was too restrictive. Fixing the permissions on the file now yields a healthy vSphereDB Daemon Status.

However the fact that the socket file disappeared on reboot and was not recreated is a bit worrying.

tolecnal commented 2 years ago

After performing two reboots now after creating the socket file manually, we seem to be on safe ground.

So to summarize, the original issue was caused by OS hardening where the /proc file system was mounted with the option hidpid=2 which effectively restricts permissions to the file system. This can be overridden if so desired by limiting access to a specific group, or disabled completely by removing it from /etc/fstab.

The issue can be closed - but you might consider including your patch into master for future users with similar problems.

Thomas-Gelf commented 2 years ago

Thank you @tolecnal for letting me know. The patch catching this error condition has been pushed, will be released with the next version