adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
84 stars 101 forks source link

Ansible request for Nagios_Master_Config_Tool to run as Nagios User #1876

Open Willsparker opened 3 years ago

Willsparker commented 3 years ago

ref: https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/1862#issuecomment-770674426 , #1670

Details: Currently the tool runs on the root user of the Nagios Server, which is not ideal. This issue may be irrelevant if #1670 is looked at first, but if this can be fixed easily, it's probably better to get this done in the meanwhile :-)

sxa commented 3 years ago

Not quite sure why at the moment but a run I've just done has hung on the nagios master config at this stage:

TASK [Nagios_Master_Config : SSH into the Nagios Master and excute the Nagios_Ansible_Config_tool.sh script] ***
sxa commented 3 years ago

If I run the full playbook excluding nagios_master_config I'm getting this:

TASK [Nagios_Tunnel : Place Adopt_Tunnel_User.key in nagios users ssh folder] ***
fatal: [test-marist-ubuntu1604-s390x-4]: FAILED! => {"msg": "The conditional check 'ansible_port != \"22\"' failed. The error was: error while evaluating conditional (ansible_port != \"22\"): 'ansible_port' is undefined\n\nThe error appears to be in '/tmp/awx_197_ob92l7o6/project/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/roles/Nagios_Tunnel/tasks/main.yml': line 12, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Place Adopt_Tunnel_User.key in nagios users ssh folder\n  ^ here\n"}
sxa commented 3 years ago

OK full run works in AWX if I exclude both nagios_master_config and nagios_tunnel so this is good :-)

Willsparker commented 3 years ago

With Nagios_Tunnel, the variable ansible_port is defined in Nagios_Master_Config. As for it hanging, I've never encountered that. Looking in the tool's folder (/usr/local/nagios/Nagios_Ansible_Config_Tool/) on the Nagios Server, it never got to the point of creating the config file from all the templates, so it may have been the machine attempting to ssh to the Nagios server?

Willsparker commented 3 years ago

My bad- that task is delegated to localhost. Can AWX ssh to the Nagios Server's Root user?

Willsparker commented 3 years ago

I've added the AWX ssh key to the Nagios server, and I was able to re-add build-digitalocean-centos69-x69-2 to Nagios on this awx run :-)

Now I can get round to trying to make it run as the Nagios user

Willsparker commented 3 years ago

This works:

will@will-XPS-13-9360:~$ ssh root@nagios "su nagios -c whoami"
nagios

So I may just tack this on the front of the command, here

I'll remove build-digitalocean-centos69-x64-2 from Nagios and attempt to add it on, in this fashion, again and list any problems here.

Willsparker commented 3 years ago

First issue:

"bash: usr/local/nagios/Nagios_Ansible_Config_tool/Nagios_Ansible_Config_tool.sh: Permission denied"

EDIT: The issue was that I was running su nagios, not su - nagios, so the nagios user was trying to execute from a place it didn't have permissions in. :facepalm:

Willsparker commented 3 years ago

Having made that change:

        "Operating System patches yes /usr/local/nagios/Nagios_Ansible_Config_tool//templates/yum.cfg", 
        "Timesync Check: /usr/local/nagios/Nagios_Ansible_Config_tool//templates/check_ntp_timesync.cfg", 
        "##################################################", 
        "", 
        "ERROR: Unable to connect to client 167.71.130.191 as the nagios user", 
        "Please ensure that the nagios user is able to ssh into the client machine using keys"

This is happening because, the ssh key of the user who runs the Nagios_Ansible_Config_Tool, need to be in the target machine's authorized keys file. The ssh key of the root user on the Nagios Server is put into the target machine's authorized keys file via a task in Nagios plugins : https://github.com/AdoptOpenJDK/openjdk-infrastructure/blob/7bb3de4a31a58a0074163a8bcdad0423975ef648/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/roles/Nagios_Plugins/tasks/main.yml#L22

So, that will need to change to the ssh-key of the Nagios user on the Nagios Server. (Which means the ssh-key will need to be changed in the secrets repo too). Thankfully, due to some excellent work on #1870 , I can test this locally :grin:

Willsparker commented 3 years ago

Doing that , I was still getting the error message. However:

# Test: Ensure an ssh connecttion can be made to Nagios client system by the Nagios user and automatically add fingerprint key
Nagios_Login=`su nagios -c "ssh -o StrictHostKeyChecking=no $Sys_IPAddress uptime"`

This line was causing an issue. I believe this was due to the su nagios command, when the script is already being executed by Nagios. So, I removed that, and made the ssh command explicitly connect to the Nagios user (i.e. nagios@$Sys_IPAddress), and the following happened:

nagios@nagios:/usr/local/nagios/Nagios_Ansible_Config_tool$ ./Nagios_Ansible_Config_tool.sh CentOS x86_64 build-digitalocean-centos69-x64-2 167.71.130.191 digitalocean 22
167.71.130.191

##################################################
Hostname: build-digitalocean-centos69-x64-2
IP Address: 167.71.130.191
SSH Port Number: 22
Enable check_mem: yes
Enable Nagios Graphs: yes
Enable Icons: yes CentOS
Enable Notifications: yes
Add Description Info: yes Add by Ansible
Operating System patches yes /usr/local/nagios/Nagios_Ansible_Config_tool//templates/yum.cfg
Timesync Check: /usr/local/nagios/Nagios_Ansible_Config_tool//templates/check_ntp_timesync.cfg
##################################################

cp: cannot create regular file 'build-digitalocean-centos69-x64-2.cfg': Permission denied
sed: can't read build-digitalocean-centos69-x64-2.cfg: No such file or directory
sed: can't read build-digitalocean-centos69-x64-2.cfg: No such file or directory
sed: can't read build-digitalocean-centos69-x64-2.cfg: No such file or directory
sed: can't read build-digitalocean-centos69-x64-2.cfg: No such file or directory
sed: can't read build-digitalocean-centos69-x64-2.cfg: No such file or directory
cp: cannot create regular file '/usr/local/nagios/Nagios_Ansible_Config_tool//templates/check_mem_tmp.file.7791': Permission denied
sed: can't read /usr/local/nagios/Nagios_Ansible_Config_tool//templates/check_mem_tmp.file.7791: No such file or directory
sed: can't read /usr/local/nagios/Nagios_Ansible_Config_tool//templates/check_mem_tmp.file.7791: No such file or directory
./Nagios_Ansible_Config_tool.sh: line 212: build-digitalocean-centos69-x64-2.cfg: Permission denied
rm: cannot remove '/usr/local/nagios/Nagios_Ansible_Config_tool//templates/check_mem_tmp.file.7791': No such file or directory
sed: can't read build-digitalocean-centos69-x64-2.cfg: No such file or directory
./Nagios_Ansible_Config_tool.sh: line 225: build-digitalocean-centos69-x64-2.cfg: Permission denied
./Nagios_Ansible_Config_tool.sh: line 228: build-digitalocean-centos69-x64-2.cfg: Permission denied
sed: can't read build-digitalocean-centos69-x64-2.cfg: No such file or directory
sed: can't read build-digitalocean-centos69-x64-2.cfg: No such file or directory
sed: can't read build-digitalocean-centos69-x64-2.cfg: No such file or directory
build-digitalocean-centos69-x64-2 Already exists in the hostgroup, skipping
Conducting Pre-flight checks...
mv: cannot stat 'build-digitalocean-centos69-x64-2.cfg': No such file or directory
./Nagios_Ansible_Config_tool.sh: line 291: flight_check_20210329-120913.log: Permission denied
cat: flight_check_20210329-120913.log: No such file or directory
cat: flight_check_20210329-120913.log: No such file or directory
ERROR: Something when wrong...
./Nagios_Ansible_Config_tool.sh: line 296: /usr/local/nagios/Nagios_Ansible_Config_tool//build-digitalocean-centos69-x64-2.cfg.error_20210329-120913.log: Permission denied
mv: cannot stat '/usr/local/nagios/etc/servers//build-digitalocean-centos69-x64-2.cfg': No such file or directory
Please see error log: /usr/local/nagios/Nagios_Ansible_Config_tool//build-digitalocean-centos69-x64-2.cfg.error_20210329-120913.log
Host configure: /usr/local/nagios/Nagios_Ansible_Config_tool//build-digitalocean-centos69-x64-2.cfg
^C

So, a bunch of permissions errors, which shouldn't be so hard to fix. However, this whole thing ends up being so confusing, with which machine is ssh-ing to the other, that some added Documentation / A diagram, may help. It'd certainly help my understanding. It's fairly clear why #1670 is required, and it may be preferable to just start on that, as opposed to getting this 100% working.

sxa commented 3 years ago

Iceboxing for now until someone is able to pick this up and start progressing it.