GoogleCloudPlatform / guest-configs

Apache License 2.0
31 stars 40 forks source link

dhcp hostname: don't reset hostname if the hostname hasn't changed #44

Closed dorileo closed 1 year ago

dorileo commented 1 year ago

It prevents restarting network and rsyslog when it's not actually required.

google-oss-prow[bot] commented 1 year ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bkatyl, dorileo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/GoogleCloudPlatform/guest-configs/blob/master/OWNERS)~~ [bkatyl,dorileo] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
chrisboulton commented 1 year ago

This seems like it generated a hostname change on all of our infrastructure, which is breaking systems that depend on it for various consensus algorithms and other things that have some kind of dependency on a stable hostname.

In the version of google-compute-engine prior to 20230327.00, hostname returned (at least in our environment) the short hostname (my-host) but now returns the full FQDN my-host.c.project-name.internal

Is there some unintended behaviour change that has come down as part of this?

dorileo commented 1 year ago

Hi @chrisboulton

I can't reproduce the issue you described. This PR doesn't change how hostnames are applied, see the lines hostname "${new_host_name%%.*}" and nmcli general hostname "${new_host_name%%.*}", it's cutting out the fully qualified name and applying only the hostname portion - these lines weren't touched by this PR.

Thanks, Leo

chrisboulton commented 1 year ago

@drebes Understood, but I believe there's some side effect this causes.

Testing this again (Debian Bullseye, though the version is irrelevant here - this happened on a Buster instance in my first report):

$ hostname
empty-soe-cr2r
$ apt-cache policy google-compute-engine
google-compute-engine:
  Installed: 1:20220211.00-g1
  Candidate: 1:20230202.00-g1
  Version table:
     1:20230327.00-g1 500
        500 http://packages.cloud.google.com/apt google-compute-engine-bullseye-stable/main amd64 Packages
     1:20230202.00-g1 1000
        700 http://(internal)/bullseye custom/main amd64 Packages
 *** 1:20220211.00-g1 100
        100 /var/lib/dpkg/status
$ apt-get install google-compute-engine=1:20230327.00-g1
...
The following packages will be upgraded:
  google-compute-engine
1 upgraded, 0 newly installed, 0 to remove and 5 not upgraded.
Need to get 0 B/10.4 kB of archives.
After this operation, 1,024 B of additional disk space will be used.
(Reading database ... 110246 files and directories currently installed.)
Preparing to unpack .../google-compute-engine_1%3a20230327.00-g1_all.deb ...
Unpacking google-compute-engine (1:20230327.00-g1) over (1:20220211.00-g1) ...
Setting up google-compute-engine (1:20230327.00-g1) ...

$ hostname
empty-soe-cr2r
$ hostname -f
empty-soe-cr2r.c.(project).internal
$ cat /etc/hosts
127.0.0.1   localhost
::1     localhost ip6-localhost ip6-loopback
ff02::1     ip6-allnodes
ff02::2     ip6-allrouters

10.133.0.29 empty-soe-cr2r.c.(project).internal empty-soe-cr2r  # Added by Google
169.254.169.254 metadata.google.internal  # Added by Google

$ hostnamectl 
   Static hostname: empty-soe-cr2r.c.(project).internal
Transient hostname: empty-soe-cr2r
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 01d24473958f707ffd08df74da95164b
           Boot ID: d7d9d5c0512e4f4784d165afc75ef0b8
    Virtualization: kvm
  Operating System: Debian GNU/Linux 11 (bullseye)
            Kernel: Linux 5.10.0-20-cloud-amd64
      Architecture: x86-64

Post reboot:

$ hostname
empty-soe-cr2r.c.(project).internal
$ hostnamectl
   Static hostname: empty-soe-cr2r.c.(project).internal
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 01d24473958f707ffd08df74da95164b
           Boot ID: b2218482a973416a85243b69e1878954
    Virtualization: kvm
  Operating System: Debian GNU/Linux 11 (bullseye)
            Kernel: Linux 5.10.0-20-cloud-amd64
      Architecture: x86-64

I can revert the package to an older release (which we rebuilt), and the existing behaviour returns.

It seems like Transient hostname is no longer being set (ie it was inexplicably being set before), possibly because the call to hostname forces SystemD/hostnamectl to set a static hostname.

If I'm interpreting correctly, new_host_name = empty-soe-cr2r. This means we'll always trigger that block because the short name is being compared to the fully qualified name, which will cause a call to hostname every time the hook is fired - the call to hostname with an FQDN wipes out the transient hostname, possibly because you're asking the system to set it.

dorileo commented 1 year ago

Hi @chrisboulton

What's the content of /etc/hostname?

chrisboulton commented 1 year ago

The fully qualified name (no change here)

dorileo commented 1 year ago

Hi @chrisboulton

I just sent a PR fixing it, I appreciate the feedback.

Thanks, Leo

chrisboulton commented 1 year ago

Hey @dorileo --thanks, I supper appreciate you jumping on this and getting a fix down.

I wonder (I'm not anywhere I can verify this personally right now) if it would be safest to call hostname -s for use in the comparison? My assumption is that new_host_name is always the short version of the hostname, and I don't know under what conditions (I guess if someone has a static hostname configured) hostname (no args) will return an FQDN. Is that something we need to possibly care about?

dorileo commented 1 year ago

Hi @chrisboulton

new_host_name will always be fqdn (when that's the case/available), my change accounts to that, we are truncating it in the check statement because that's what we are interested on. If the static hostname is fqdn then it will be =! to new_host_name and falling under the expected condition for setting the transient hostname.

I made sure to test all these use cases where the static hostname is fqdn or not, as well as across other systems I.e rhel.

Thanks, Leo