colebrooke / kubernetes-nagios

Basic health checks for a Kubernetes cluster
MIT License
50 stars 44 forks source link

NRPE: Unable to read output #10

Closed rosri1992 closed 4 years ago

rosri1992 commented 4 years ago

Hello Team,

I'm trying to set up monitoring of Kubectl cluster using this article https://github.com/colebrooke/kubernetes-nagios

I came a weird Issues. I have the Kubectl cluster running on Remote RHEL server. When I try to run the scripts locally using NRPE it works. From Remote server locally. /usr/local/nagios/libexec/check_pods.sh -k -n -w 500 -C 800 OK - pods are all OK, found 2 in state.

Same command using nrpe plugin on remote server too /usr/local/nagios/libexec/check_nrpe -H localhost -c check_pod_cjoc NRPE: Unable to read output

So I have defined a command definition in nrpe.cfg & restarted NRPE agent on the Remote server.

When I try to invoke this script from Nagios server. I'm getting "NRPE: Unable to read output" error.

From Nagios Server /usr/local/nagios/libexec/check_nrpe -H -c check_pod_cjoc NRPE: Unable to read output

I have tested with two versions of NRPE agent i.e 3.2.1 & 4.0.3, I didn't try with other versions, but getting same error message

Note: Nagios user has admin(sudo) rights to run these scripts on Remote server.

Nagios running is running on v4.4.5 on RHEL server.

Let me know if you need more information. Can you guys please look at it. @ericloyd @sawolf

Stay Home#######Stay Safe

Thanks, Srikanth

rosri1992 commented 4 years ago

The Issue which I have mentioned above is seen only from Nagios server.

The Command works fine on Remote server and gives expected output from Kubectl cluser.

When Nagios server tries to run this script on Remote server using check_nrpe plugin, it fails and give me NRPE Unable to read output error

colebrooke commented 4 years ago

Hi, it sounds like the issue is probably permissions related. Keep in mind the check_nrpe command should be run from the nagios user. You should try that. For testing, you could firstly edit the script on the remote server to "echo 'OK'; exit 0" near the top of the script. Then call this from the nagios server and locally via check_nrpe to see if that works. Then edit the script and add near the top: sudo -v && echo 'OK' || echo 'sudo not working' This should help you understand if you can run sudo commands remotely from nagios. Hope that helps!

rosri1992 commented 4 years ago

Hi,

Thanks for getting back to me so quickly. I tried the recommended changes to the script and I'm getting Ok respone when I execute the scripts locally on Remote server and also when I try to invoke from Nagios server.

So I'm thinking, nagios user on both server(Remote Server & Nagios Server) has SUDO permission to run your scripts.

So now we can rule out permission Issue, but I'm still getting the error.

Let me know if you need any additional information

justin-ce commented 4 years ago

Try editing the script at the top to include: kubectl config current-context You may find the issue that the nagios user doesn't have access to the kubeconfig file for example.

rosri1992 commented 4 years ago

HI Justin,

Nagios user has permissions to execute kubectl file.

[nagios@host libexec]$ /usr/local/nagios/libexec/check_test.sh -k .kube/config -n cloudbees-core error: current-context is not set OK - pods are all OK, found 3 in ready state.

I got that error. but script returned correct output of the cluster.

let me know if you need more information

rosri1992 commented 4 years ago

Hi,

Till yesterday, I was passing parameters to the command definition in nrpe.cfg in Remote server which is below:

"command[check_pod_cjoc]=/usr/local/nagios/libexec/check_pods.sh -k /.kube/config -n cloudbees-core "

When I try from Nagios server, I was getting this error "NRPE: Unable to read output" which is below:

[nagios@nagioserver nagios]$ /usr/local/nagios/libexec/check_nrpe -H Remote-Server -c check_pod_cjoc NRPE: Unable to read output

Today I tried something different. I tried passing the parameters as arguments in nrpe.cfg file in remote server which is below:

"command[check_pod]=/usr/local/nagios/libexec/check_test.sh -k $ARG1$ -n $ARG2$"

Restarted NRPE service on remote server.

When I try from Nagios server, I'm getting this error now which is different.

[nagios@nagios-host ~]$ /usr/local/nagios/libexec/check_nrpe -H remote-host -c check_pod -a /home/ec2-user/.kube/config cloudbees-core NRPE: Command 'check_pod!/home/ec2-user/.kube/config!cloudbees-core' not defined

Note: check_test.sh & Check_pods.sh are same scripts which are copied from this repository "https://github.com/colebrooke/kubernetes-nagios" which same as check_kube_pods.sh script.

I have tried every work-around to get that command working but I don't what is wrong. I couldn't figure out the root cause why nagios is unable to capture the output when ever the check_pods.sh run on remote server.

I'm going to paste the ideal output of the script when I execute it in locally on remote server.

[nagios@remote-server ~]$ /usr/local/nagios/libexec/check_test.sh -k /h/.kube/config -n cloudbees-core -v OK - pods are all OK, found 3 in ready state. OK: Pod: ny-master-0 PodScheduled: True OK: Pod: ny-master-0 ContainersReady: True OK: Pod: ny-master-0 Ready: True OK: Pod: ny-master-0 Initialized: True OK: Pod: heist-master-2-0 PodScheduled: True OK: Pod: heist-master-2-0 ContainersReady: True OK: Pod: heist-master-2-0 Ready: True OK: Pod: heist-master-2-0 Initialized: True OK: Pod: cjoc-0 PodScheduled: True OK: Pod: cjoc-0 ContainersReady: True OK: Pod: cjoc-0 Ready: True OK: Pod: cjoc-0 Initialized: True [nagios@remote-server ~]$

Let me know if anyone needs more information.

Thanks, Srikanth

colebrooke commented 4 years ago

Hi, I'm afraid I don't have the time to further support you on this issue. If you can run the script locally, you should be able to make it work with nagios - it's just a matter of permissions or other small changes in your setup that it's hard for me to diagnose. I'm not using nagios anymore, so I don't even have a setup to test this with. I hope you get it sorted!