Azure / azhpc-images

Azure HPC/AI VM Images
MIT License
95 stars 77 forks source link

AlmaLinux HPC image hostname change broken #307

Closed themorey closed 6 months ago

themorey commented 6 months ago

Image URN: almalinux:almalinux-hpc:8_7-hpc-gen2:8.7.2023111401

ISSUE:
some VMs will detect the hostname change and others do not. Both running the same image/version. It happens regularly.

Working VM:

[jerry@jm-sbal-hn ~]$ ssh 10.0.0.6 "cat /var/log/waagent.log |grep hostname"

2024-02-02T14:46:13.222368Z INFO EnvHandler ExtHandler Retrieving hostname from /var/lib/cloud/data/set-hostname
2024-02-02T14:46:13.223398Z INFO EnvHandler ExtHandler Published hostname record does not exist, creating [/var/lib/waagent/published_hostname] with hostname [5uch8000002]
2024-02-02T14:48:40.955838Z INFO EnvHandler ExtHandler Retrieving hostname from /var/lib/cloud/data/set-hostname
2024-02-02T14:48:40.956809Z INFO EnvHandler ExtHandler Published hostname record does not exist, creating [/var/lib/waagent/published_hostname] with hostname [5uch8000002]
2024-02-02T14:48:41.008931Z INFO EnvHandler ExtHandler EnvMonitor: Detected hostname change: 5uch8000002 -> jm-sbal-hpc-2

Non-working VM:

[jerry@jm-sbal-hn ~]$ ssh 10.0.0.11 "hostname; cat /var/log/waagent.log |grep hostname"
jm-sbal-hpc-1
2024-02-02T14:46:12.339275Z INFO EnvHandler ExtHandler Retrieving hostname from /var/lib/cloud/data/set-hostname
2024-02-02T14:46:12.341152Z INFO EnvHandler ExtHandler Published hostname record does not exist, creating [/var/lib/waagent/published_hostname] with hostname [5uch8000000]

Monitor hostname is enabled:

[jerry@jm-sbal-hn scripts]$ ssh 10.0.0.11 "grep -i hostname /etc/waagent.conf"
Provisioning.MonitorHostName=y
Provisioning.MonitorHostNamePeriod=60
jithinjosepkl commented 6 months ago

@themorey - can you flag this agains waagent repo?