Rackspace-DOT / nova-agent

Other
4 stars 18 forks source link

Hostname is corrupted during Server Restart #71

Closed domibay-hugo closed 5 years ago

domibay-hugo commented 5 years ago

I found that our Server Centos7 that is running the Nova-Agent was unable to communicate:

admin@admin-domain.com (expanded from ): host mail.company.com[162.13.142.151] said: 553 5.1.8 root@agent02.company.com.localdomain... Domain of sender address root@agent02.company.com.localdomain does not exist (in reply to MAIL FROM command)

After investigating this issue I found that the hostname was changed from the correct canonical Server Name agent02.company.com to the invalid broken Server Name agent02.company.com.localdomain by adding .localdomain Extension to the valid Server Name on Server Reboot

[2019-09-23 09:46:00 - root@agent02 log]# vi nova-agent.log 2019-09-23 09:41:54,214 [INFO ] Agent is starting up 2019-09-23 09:41:54,229 [INFO ] Sending notification startup is complete

[2019-09-23 09:56:37 - root@agent02 log]# vi messages Sep 23 09:42:18 agent02 journal: [CLOUDINIT] util.py[WARNING]: Failed forking and calling callback NoneType Sep 23 09:42:18 agent02 systemd-hostnamed: Changed static host name to 'agent02.company.com.localdomain' Sep 23 09:42:18 agent02 systemd-hostnamed: Changed static host name to agent02.company.com.localdomain' > Sep 23 09:42:18 agent02 NetworkManager[507]: [1569228138.4040] settings: hostname changed from "agent02.company.com" to "agent02.company.com.localdomain" Sep 23 09:42:18 agent02 NetworkManager[507]: [1569228138.4040] settings: hostname changed from "agent02.company.com" to "agent02.company.com.localdomain" Sep 23 09:42:18 agent02 systemd-hostnamed: Changed host name to 'agent02.company.com.localdomain' Sep 23 09:42:18 agent02 systemd-hostnamed: Changed host name to 'agent02.company.com.localdomain' Sep 23 09:42:18 agent02 nm-dispatcher: req:1 'hostname': new request (3 scripts) Sep 23 09:42:18 agent02 nm-dispatcher: req:1 'hostname': start running ordered scripts... Sep 23 09:42:18 agent02 nm-dispatcher: req:1 'hostname': new request (3 scripts) Sep 23 09:42:18 agent02 nm-dispatcher: req:1 'hostname': start running ordered scripts... Sep 23 09:42:18 agent02 journal: [CLOUDINIT] stages.py[INFO]: Skipping modules ['ssh-import-id', 'byobu'] because they are not verified on distro 'rhel'. To run anyway, add them to 'unverified_modules' in config. Sep 23 09:42:19 agent02 journal: [CLOUDINIT] cc_final_message.py[WARNING]: Used fallback datasource

[2019-09-23 09:58:04 - root@agent02 log]# hostnamectl Static hostname: agent02.company.com.localdomain Icon name: computer-vm Chassis: vm Machine ID: e88df8690ebf4c9496b99ec8f51f8e4f Boot ID: 1e8ff758b4064a7f9f1e0a496e4fb900 Virtualization: xen Operating System: CentOS Linux 7 (Core) CPE OS Name: cpe:/o:centos:centos:7 Kernel: Linux 3.10.0-514.26.2.el7.x86_64 Architecture: x86-64

[2019-09-23 10:03:32 - root@agent02 log]# rpm -qi nova-agent Name : nova-agent Version : 2.1.20 Release : 1.el7 Architecture: noarch Install Date: lun 23 sep 2019 09:04:42 WEST Group : Unspecified Size : 109019 License : ASL 2.0 Signature : RSA/SHA256, lun 17 dic 2018 21:59:52 WET, Key ID 6a2faea2352c64e5 Source RPM : nova-agent-2.1.20-1.el7.src.rpm Build Date : lun 17 dic 2018 21:48:26 WET Build Host : buildvm-ppc64le-07.ppc.fedoraproject.org Relocations : (not relocatable) Packager : Fedora Project Vendor : Fedora Project URL : https://github.com/Rackspace-DOT/nova-agent Bug URL : https://bugz.fedoraproject.org/nova-agent Summary : Agent for setting up clean servers on Xen Description : Python agent for setting up clean servers on Xen using xenstore data.

This is a worrying issue because it happened in Version 1.39.1 and continues in Version 2.1.20 and it leaves the Server Configuration corrupt and the Server unable to communicate.

carlwgeorge commented 5 years ago

I don't think this has anything to do with nova-agent. Look at your logged output.

Sep 23 09:42:18 agent02 systemd-hostnamed: Changed static host name to 'agent02.company.com.localdomain'
Sep 23 09:42:18 agent02 systemd-hostnamed: Changed static host name to agent02.company.com.localdomain'
Sep 23 09:42:18 agent02 NetworkManager[507]: [1569228138.4040] settings: hostname changed from "agent02.company.com" to "agent02.company.com.localdomain"
Sep 23 09:42:18 agent02 NetworkManager[507]: [1569228138.4040] settings: hostname changed from "agent02.company.com" to "agent02.company.com.localdomain"
Sep 23 09:42:18 agent02 systemd-hostnamed: Changed host name to 'agent02.company.com.localdomain'
Sep 23 09:42:18 agent02 systemd-hostnamed: Changed host name to 'agent02.company.com.localdomain'

systemd-hostnamed and NetworkManager seem to be involved. In the Rackspace CentOS 7 server image, systemd-hostnamed is disabled by default, and NetworkManager is disabled and masked to prevent it from running. If you need to run those for some reason, I suggest opening a ticket with Rackspace to look into this.

domibay-hugo commented 5 years ago

It turned out that the "cloud-init" scripts were overwriting the system configurations as documented at https://stackoverflow.com/questions/49249375/how-to-disable-cloud-init-networking the necessary configurations in /etc/cloud/cloud.cfg.d/10_rackspace.cfg were not included in the old version of it and this file is copied manually during the installation.

So I could fix this issue manually adding the correct configurations to this file.

I'm sorry for the inconvenience I caused you.

carlwgeorge commented 5 years ago

Always happy to help, glad you got it sorted out.