Closed Eloar closed 1 year ago
Have you created the nrpe user in your Postgresql cluster? Also, did you configure pg_hba.conf?
nrpe and nagios users and groups are created when nrpe is installed using yum. pg_hba.conf was modified. Without pg_hba.conf configuration check would not work locally. Check works fine, but only local on DB machine. It doesn't work when invoked over NRPE. Other checks works fine, either local and over network from Nagios Machine.
You are talking about the OS user and group. I'm talking about the postgresql user. Can you show me your pg_hba.conf file?
I am pretty confident I can help you, I did that before going on vacation.
It should not matter. It gave me a clue, so I added to each command user parameter: -u postgres
. Unfortunatelly result is the same
NRPE: Unable to read output
the most important part in pg_hba.conf is:
local all all trust
I'm saying that from the top of my head, as I'm still on vacation, but I think you can try -h localhost.
-h
is option for help -H
is for host, but no host is resolved to UNIX Socket and is equivalent to psql
without host. I don't want to open access to postgres user for anything other than Unix socket
Makes sense, sorry. I'll try to VPN into the office in the next few days to give you details about my setup. Have you tried running the command directly from the shell as nrpe user? You may have to temporarily enable shell login by changing the shell for the nrpe user in /etc/passwd
yes, just tried it again and it works. I mean direct execution of check_postgres action as nrpe user does work
# sudo -u nrpe -- /usr/lib64/nagios/plugins/check_postgres_connection --db=postgres
POSTGRES_CONNECTION OK: DB "postgres" version 10.3 | time=0.01s
Here is how I configured my nrpe actions:
command[check_postgres_action]=/usr/bin/check_postgres.pl -u nrpe --action=$ARG1$
command[check_postgres_action_db]=/usr/bin/check_postgres.pl -u nrpe --action=$ARG1$ --db=$ARG2$
command[check_postgres_action_warning_critical]=/usr/bin/check_postgres.pl -u nrpe --action=$ARG1$ -w=$ARG2$ -c=$ARG3$ $ARG4$ $ARG5$
And i call them this way from Nagios, for example:
/usr/lib/nagios/plugins/check_nrpe -H atqatld1 -c check_postgres_action -a backends
I have an nrpe user created on the postgres cluster and here is the relevant line of my pg_hba.conf
file:
local all nrpe peer
It is somewhat the equivalent of your trust but in your case I think that you allow any OS user to access any database in the PostgreSQL cluster, which is a bit risky.
Did you check for SELinux AVCs? What OS are you running? I had to create an SELinux module to make it work on RHEL 8.
My pg_hba.conf
line allows any local user to access any db user. It is bit risky but not that much. I'm afraid user nrpe on Postgres will need to have its access elevated.
I'm using CentOS 7, with SELinux enabled, but it is configured ok, as other checks work fine both local and over nrpe.
EDIT: I've modified my environment configuration to similar to Yours.
CREATE ROLE nrpe WITH LOGIN;
pg_hba.conf
local all all peer
nrpe.cfg
command[check_postgres_locks]=/usr/lib64/nagios/plugins/check_postgres_locks -w 2 -c 3
command[check_postgres_bloat]=/usr/lib64/nagios/plugins/check_postgres_bloat -w='100 M' -c='200 M'
command[check_postgres_connection]=/usr/lib64/nagios/plugins/check_postgres_connection --db=postgres
command[check_postgres_backends]=/usr/lib64/nagios/plugins/check_postgres_backends
# sudo -u nrpe -- /usr/lib64/nagios/plugins/check_postgres_connection -u nrpe --db=postgres
POSTGRES_CONNECTION OK: DB "postgres" version 10.3 | time=0.04s
# /usr/lib64/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_postgres_connection
NRPE: Unable to read output
Unfortunatelly problem is not resolved
I think you need the -u option. And I doubt SELinux's default configuration works for those checks. Do you log connections in your postgresql logs? Do you see connections attempts in the logs when executing via nrpe?
I thought SELinux was a good clue, so I've checked files contexts in /usr/lib64/nagios/plugins
. Plugins installed with yum had context system_u:object_r:nagios_unconfined_plugin_exec_t:s0 where check_postgres.pl and it's symlinks had context unconfined_u:object_r:lib_t:s0
. Unfortunatelly changing it to nagios_unconfined_plugin_exec_t
did not resolve issue.
Did you check if you had any avc entries in /var/log/audit/audit.log?
I'm back on it after weekend. Unfortunatelly I've found nothing related to nrpe, nagios or postgres in /var/log/audit/audit.log
.
Make sure you log connections in your Postgresql log and check if you see a connection attempt when an nrpe check is executed
Check in your sudoers file if you have a line called requiretty. Comment it and test if you do. I don't think it will do anything as you're not using sudo
Did you check /var/log/secure, /var/log/messages?
Test by doing this instead of using sudo:
sudo -i
/path/to/check_postgres args
I don't use sudo for NRPE execution. I've used sudo for debugging check_postgres if it would run properly under user nrpe permissions. NRPE process runs as nrpe user.
I've found something like that in syslog:
sie 10 14:07:05 dev-db nrpe[29748]: CONN_CHECK_PEER: checking if host is allowed: DEV-NAGIOS port 41682
sie 10 14:07:05 dev-db nrpe[29748]: Connection from DEV-NAGIOS port 41682
sie 10 14:07:05 dev-db nrpe[29748]: is_an_allowed_host (AF_INET): is host >DEV-NAGIOS< an allowed host >DEV-NAGIOS<
sie 10 14:07:05 dev-db nrpe[29748]: is_an_allowed_host (AF_INET): is host >DEV-NAGIOS< an allowed host >DEV-NAGIOS<
sie 10 14:07:05 dev-db nrpe[29748]: is_an_allowed_host (AF_INET): host is in allowed host list!
sie 10 14:07:05 dev-db nrpe[29748]: Host address is in allowed_hosts
sie 10 14:07:05 dev-db nrpe[29748]: Host DEV-NAGIOS is asking for command 'check_postgres_backends' to be run...
sie 10 14:07:05 dev-db nrpe[29748]: Running command: /usr/lib64/nagios/plugins/check_postgres_backends --user=nrpe -w=70 -c=100
sie 10 14:07:05 dev-db nrpe[29749]: WARNING: my_system() seteuid(0): Operation not permitted
sie 10 14:07:05 dev-db nrpe[29749]: Warning: Could not set effective GID=999
sie 10 14:07:05 dev-db nrpe[29748]: Command completed with return code 3 and output:
sie 10 14:07:05 dev-db nrpe[29748]: Return Code: 3, Output: NRPE: Unable to read output
sie 10 14:07:05 dev-db nrpe[29748]: Connection from DEV-NAGIOS closed.
Ok, you have hints now. What about the first point of my last comment?
About first hint I've enabled logging connections and disconnections in postgresql. I got effective connection when running check_postgres.pl --action connection
directly but not when trying to run it over NRPE. In journalct I've got of valid check:
sie 10 16:38:32 dev-db nrpe[4393]: CONN_CHECK_PEER: checking if host is allowed: DEV-NAGIOS port 8919
sie 10 16:38:32 dev-db nrpe[4393]: Connection from DEV-NAGIOS port 8919
sie 10 16:38:32 dev-db nrpe[4393]: is_an_allowed_host (AF_INET): is host >DEV-NAGIOS< an allowed host >DEV-NAGIOS<
sie 10 16:38:32 dev-db nrpe[4393]: is_an_allowed_host (AF_INET): is host >DEV-NAGIOS< an allowed host >DEV-NAGIOS<
sie 10 16:38:32 dev-db nrpe[4393]: is_an_allowed_host (AF_INET): host is in allowed host list!
sie 10 16:38:32 dev-db nrpe[4393]: Host address is in allowed_hosts
sie 10 16:38:32 dev-db nrpe[4393]: Host DEV-NAGIOS is asking for command 'check_total_procs' to be run...
sie 10 16:38:32 dev-db nrpe[4393]: Running command: /usr/lib64/nagios/plugins/check_procs -w 150 -c 200
sie 10 16:38:32 dev-db nrpe[4394]: WARNING: my_system() seteuid(0): Operation not permitted
sie 10 16:38:32 dev-db nrpe[4394]: Warning: Could not set effective GID=999
sie 10 16:38:32 dev-db nrpe[4393]: Command completed with return code 0 and output: PROCS OK: 80 processes | procs=80;150;200;0;
sie 10 16:38:32 dev-db nrpe[4393]: Return Code: 0, Output: PROCS OK: 80 processes | procs=80;150;200;0;
sie 10 16:38:32 dev-db nrpe[4393]: Connection from DEV-NAGIOS closed.
For check_postgres.pl
I've got:
sie 10 16:33:50 dev-db nrpe[4185]: CONN_CHECK_PEER: checking if host is allowed: DEV-NAGIOS port 64982
sie 10 16:33:50 dev-db nrpe[4185]: Connection from DEV-NAGIOS port 64982
sie 10 16:33:50 dev-db nrpe[4185]: is_an_allowed_host (AF_INET): is host >DEV-NAGIOS< an allowed host >DEV-NAGIOS<
sie 10 16:33:50 dev-db nrpe[4185]: is_an_allowed_host (AF_INET): is host >DEV-NAGIOS< an allowed host >DEV-NAGIOS<
sie 10 16:33:50 dev-db nrpe[4185]: is_an_allowed_host (AF_INET): host is in allowed host list!
sie 10 16:33:50 dev-db nrpe[4185]: Host address is in allowed_hosts
sie 10 16:33:50 dev-db nrpe[4185]: Host DEV-NAGIOS is asking for command 'check_postgres_locks' to be run...
sie 10 16:33:50 dev-db nrpe[4185]: Running command: /usr/lib64/nagios/plugins/check_postgres.pl --action locks --user=nrpe -w 2 -c 3
sie 10 16:33:50 dev-db nrpe[4186]: WARNING: my_system() seteuid(0): Operation not permitted
sie 10 16:33:50 dev-db nrpe[4186]: Warning: Could not set effective GID=999
sie 10 16:33:50 dev-db nrpe[4185]: Command completed with return code 2 and output:
sie 10 16:33:50 dev-db nrpe[4185]: Return Code: 3, Output: NRPE: Unable to read output
sie 10 16:33:50 dev-db nrpe[4185]: Connection from DEV-NAGIOS closed.
apperently when run by nrpe process command check_postges.pl ends with status 2 and outputs nothing and doesn't even connects to local Postgres DB.
When using check_postgres
there is in fact problem with SELinux. When installed to /usr/lib64/nagios/plugins it gets context: unconfined_u:object_r:lib_t
or unconfined_u:object_r:unconfined_t
. It needs file context system_u:object_r:nagios_unconfined_plugin_exec_t
to be run over NRPE. Somehow symlinks to check_postgres.pl
can't get proper system_u
setting so I've swithced from using symlinks to usage --action
option. This way there is no denial in audit, yet it does not work over NRPE.
I suggest you try in permissive mode temporarily. I remember having denials without logs on RHEL 7.
I'm not keen to do so, but tried switching selinux on DB machine into permissive mode
# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28
# setenforce 0
# getenforce
Permissive
-bash-4.2# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: permissive
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 28
then I've run check_nrpe on Nagios machine without success. Both output on Nagios machine and log on DB machine are same in enforcing and permissive mode.
I'm out of ideas, sorry
Long story short I'm kind of an idiot.
So somewhere along way SELinux was a problem due to invalid file context. Whatever I've done I couldn't keep proper file context for symlinks, so I had to use check_postgres.pl
directly with option --action
.
And the reason for error was invalid option for check_postgres.pl
. When switched from symlinks to running check_postgres.pl
directly and using option --action
I've switched from short option -u
to long one but made mistake. I've wrote --user
instead --dbuser
. Apperently NRPE daemon doesn't log std_err
from invoked command. I was able to catch it after suggestion to add 2>&1
to end of command definition in nrpe.cfg
file on monitored remote system.
So conclusion is to try add 2>&1
to the and of command definition in nagios.cfg
during debugging.
Glad it all worked out! That was a tricky one to debug.
I've got 2 machines, one with PostgreSQL 10 running (DB), and one with other services including Nagios Core (lets call it just Nagios). I've installed NRPE on DB machine alongside some plugins. Commands configured in nrpe.cnf are like so:
From Nagios Machine I'm able to run any check from misc, but none from PostgreSQL section. Those fails with error:
I've installed nagios-plugin-nrpe on DB machine to eliminate network problem, and even on localhost I am unable to get valid response. Example:
locally it's fine:
and other check works just fine over NRPE:
I had trouble with connection timeout when 5666 port was not added to iptables as allowed, but that is not the case this time.
NRPE is configured to run as user nrpe and group nagios, where permissions on all files (symlinks included) in /usr/lib64/nagios/plugins are set to 755 and ownership
root:nagios
. I've installed check_postgres version 2.23.0.Any ideas how to approach this problem? I've stumbled accross this plugin because of this tutorial.