Closed dorileo closed 1 year ago
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: bkatyl, dorileo
The full list of commands accepted by this bot can be found here.
The pull request process is described here
This seems like it generated a hostname change on all of our infrastructure, which is breaking systems that depend on it for various consensus algorithms and other things that have some kind of dependency on a stable hostname.
In the version of google-compute-engine prior to 20230327.00, hostname
returned (at least in our environment) the short hostname (my-host
) but now returns the full FQDN my-host.c.project-name.internal
Is there some unintended behaviour change that has come down as part of this?
Hi @chrisboulton
I can't reproduce the issue you described. This PR doesn't change how hostnames are applied, see the lines hostname "${new_host_name%%.*}"
and nmcli general hostname "${new_host_name%%.*}"
, it's cutting out the fully qualified name and applying only the hostname portion - these lines weren't touched by this PR.
Thanks, Leo
@drebes Understood, but I believe there's some side effect this causes.
Testing this again (Debian Bullseye, though the version is irrelevant here - this happened on a Buster instance in my first report):
$ hostname
empty-soe-cr2r
$ apt-cache policy google-compute-engine
google-compute-engine:
Installed: 1:20220211.00-g1
Candidate: 1:20230202.00-g1
Version table:
1:20230327.00-g1 500
500 http://packages.cloud.google.com/apt google-compute-engine-bullseye-stable/main amd64 Packages
1:20230202.00-g1 1000
700 http://(internal)/bullseye custom/main amd64 Packages
*** 1:20220211.00-g1 100
100 /var/lib/dpkg/status
$ apt-get install google-compute-engine=1:20230327.00-g1
...
The following packages will be upgraded:
google-compute-engine
1 upgraded, 0 newly installed, 0 to remove and 5 not upgraded.
Need to get 0 B/10.4 kB of archives.
After this operation, 1,024 B of additional disk space will be used.
(Reading database ... 110246 files and directories currently installed.)
Preparing to unpack .../google-compute-engine_1%3a20230327.00-g1_all.deb ...
Unpacking google-compute-engine (1:20230327.00-g1) over (1:20220211.00-g1) ...
Setting up google-compute-engine (1:20230327.00-g1) ...
$ hostname
empty-soe-cr2r
$ hostname -f
empty-soe-cr2r.c.(project).internal
$ cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
10.133.0.29 empty-soe-cr2r.c.(project).internal empty-soe-cr2r # Added by Google
169.254.169.254 metadata.google.internal # Added by Google
$ hostnamectl
Static hostname: empty-soe-cr2r.c.(project).internal
Transient hostname: empty-soe-cr2r
Icon name: computer-vm
Chassis: vm
Machine ID: 01d24473958f707ffd08df74da95164b
Boot ID: d7d9d5c0512e4f4784d165afc75ef0b8
Virtualization: kvm
Operating System: Debian GNU/Linux 11 (bullseye)
Kernel: Linux 5.10.0-20-cloud-amd64
Architecture: x86-64
Post reboot:
$ hostname
empty-soe-cr2r.c.(project).internal
$ hostnamectl
Static hostname: empty-soe-cr2r.c.(project).internal
Icon name: computer-vm
Chassis: vm
Machine ID: 01d24473958f707ffd08df74da95164b
Boot ID: b2218482a973416a85243b69e1878954
Virtualization: kvm
Operating System: Debian GNU/Linux 11 (bullseye)
Kernel: Linux 5.10.0-20-cloud-amd64
Architecture: x86-64
I can revert the package to an older release (which we rebuilt), and the existing behaviour returns.
It seems like Transient hostname
is no longer being set (ie it was inexplicably being set before), possibly because the call to hostname
forces SystemD/hostnamectl to set a static hostname.
If I'm interpreting correctly, new_host_name = empty-soe-cr2r
. This means we'll always trigger that block because the short name is being compared to the fully qualified name, which will cause a call to hostname
every time the hook is fired - the call to hostname
with an FQDN wipes out the transient hostname, possibly because you're asking the system to set it.
Hi @chrisboulton
What's the content of /etc/hostname
?
The fully qualified name (no change here)
Hey @dorileo --thanks, I supper appreciate you jumping on this and getting a fix down.
I wonder (I'm not anywhere I can verify this personally right now) if it would be safest to call hostname -s
for use in the comparison? My assumption is that new_host_name
is always the short version of the hostname, and I don't know under what conditions (I guess if someone has a static hostname configured) hostname
(no args) will return an FQDN. Is that something we need to possibly care about?
Hi @chrisboulton
new_host_name will always be fqdn (when that's the case/available), my change accounts to that, we are truncating it in the check statement because that's what we are interested on. If the static hostname is fqdn then it will be =! to new_host_name and falling under the expected condition for setting the transient hostname.
I made sure to test all these use cases where the static hostname is fqdn or not, as well as across other systems I.e rhel.
Thanks, Leo
It prevents restarting network and rsyslog when it's not actually required.