Azure / WALinuxAgent

Microsoft Azure Linux Guest Agent
http://azure.microsoft.com/
Apache License 2.0
542 stars 372 forks source link

[BUG] Incomplete regex in get_mount_point() #2636

Closed rwagnergit closed 1 year ago

rwagnergit commented 2 years ago

Describe the bug: A clear and concise description of what the bug is.

On at least one of our Azure Linux VMs, we are seeing waagent is incorrectly identifying the mount point of the ephemeral disk. In our particular case, the ephemeral disk is present at /dev/sda, the OS disk is present ad /dev/sdac, the boot partition is /dev/sdac1 and is mounted at /boot. When waagent starts, get_mount_point() in /usr/lib/python3.6/site-packages/azurelinuxagent/common/osutil/default.py is returning /boot, which is causing waagent to (among other things) try to create the swapfile under /boot, where it runs out of space (since /boot is only 500MB in size and we are attempting to create a 16GB swapfile. I dug into the code and I believe the problem is the regex in get_mount_point() is not sufficiently specific. Instead of:

    def get_mount_point(self, mountlist, device):
        """
        Example of mountlist:
            /dev/sda1 on / type ext4 (rw)
            proc on /proc type proc (rw)
            sysfs on /sys type sysfs (rw)
            devpts on /dev/pts type devpts (rw,gid=5,mode=620)
            tmpfs on /dev/shm type tmpfs
            (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
            none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
            /dev/sdb1 on /mnt/resource type ext4 (rw)
        """
        if (mountlist and device):
            for entry in mountlist.split('\n'):
                if(re.search(device, entry)):
                    tokens = entry.split()
                    #Return the 3rd column of this line
                    return tokens[2] if len(tokens) > 2 else None
        return None

we should have:

    def get_mount_point(self, mountlist, device):
        """
        Example of mountlist:
            /dev/sda1 on / type ext4 (rw)
            proc on /proc type proc (rw)
            sysfs on /sys type sysfs (rw)
            devpts on /dev/pts type devpts (rw,gid=5,mode=620)
            tmpfs on /dev/shm type tmpfs
            (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
            none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
            /dev/sdb1 on /mnt/resource type ext4 (rw)
        """
        if (mountlist and device):
            for entry in mountlist.split('\n'):
                if(re.search(device + '[0-9 ]', entry)):  # Note change here
                    tokens = entry.split()
                    #Return the 3rd column of this line
                    return tokens[2] if len(tokens) > 2 else None
        return None

After making that change, waagent functions correctly; get_mount_point() returns None and the ephemeral disk is properly partitioned, mounted, and the swapfile created. I know there's a lot here, so I'll try to explain below, and note that this was discovered as part of Azure case #2206030040004173.

When the problem occurs, here's what we see in waagent.log:

2022-02-14T15:20:03.190695Z INFO Daemon Daemon Azure Linux Agent Version:2.2.49.2
2022-02-14T15:20:03.206265Z INFO Daemon Daemon OS: redhat 8.3
2022-02-14T15:20:03.211102Z INFO Daemon Daemon Python: 3.6.8
2022-02-14T15:20:03.217851Z INFO Daemon Daemon Run daemon
2022-02-14T15:20:03.223552Z INFO Daemon Daemon No RDMA handler exists for distro='Red Hat Enterprise Linux' version='8.3'
2022-02-14T15:20:03.244831Z INFO Daemon Daemon Error getting cloud-init enabled status from systemctl: Command '['systemctl', 'is-enabled', 'cloud-init-local.service']' returned non-zero exit status 1.
2022-02-14T15:20:06.560272Z INFO Daemon Daemon Error getting cloud-init enabled status from service: Command '['service', 'cloud-init', 'status']' returned non-zero exit status 3.
2022-02-14T15:20:06.570553Z INFO Daemon Daemon cloud-init is enabled: False
2022-02-14T15:20:06.574833Z INFO Daemon Daemon Using waagent for provisioning
2022-02-14T15:20:06.579962Z INFO Daemon Daemon Activate resource disk
2022-02-14T15:20:06.584232Z INFO Daemon Daemon Searching gen1 prefix 00000000-0001 or gen2 f8b3781a-1e82-4818-a1c3-63d806ec15bb
2022-02-14T15:20:06.595730Z INFO Daemon Daemon Found device: sda
2022-02-14T15:20:06.606048Z INFO Daemon Daemon Resource disk [/dev/sda1] is already mounted [/boot]
2022-02-14T15:20:06.612663Z INFO Daemon Daemon Enable swap
2022-02-14T15:20:06.625758Z INFO Daemon Daemon Create swap file
2022-02-14T15:20:06.633544Z ERROR Daemon Daemon Command: [umask 0077 && fallocate -l 17301504000 '/boot/swapfile'], return code: [1], result: [fallocate: fallocate failed: No space left on device
]
2022-02-14T15:20:06.645125Z INFO Daemon Daemon fallocate unsuccessful, falling back to dd
2022-02-14T15:20:09.242501Z ERROR Daemon Daemon Command: [umask 0077 && dd if=/dev/zero bs=67108864 count=257 conv=notrunc of='/boot/swapfile'], return code: [1], result: [dd: error writing '/boot/swapfile': No space left on device
6+0 records in
5+0 records out
351076352 bytes (351 MB, 335 MiB) copied, 2.58573 s, 136 MB/s
]
2022-02-14T15:20:09.326793Z ERROR Daemon Daemon dd unsuccessful
2022-02-14T15:20:10.687323Z INFO Daemon Daemon Enabled 16896000KB of swap at /boot/swapfile

And here is lsblk:

rowagn@kmb14au:~#> lsblk
NAME                                              MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                                                 8:0    0  112G  0 disk
+-sda1                                              8:1    0  112G  0 part
sdb                                                 8:16   0  256G  0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup   253:18   0  1.2T  0 lvm  /sso/data/oracle/backup
sdc                                                 8:32   0  256G  0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup   253:18   0  1.2T  0 lvm  /sso/data/oracle/backup
sdd                                                 8:48   0  256G  0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup   253:18   0  1.2T  0 lvm  /sso/data/oracle/backup
sde                                                 8:64   0  256G  0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup   253:18   0  1.2T  0 lvm  /sso/data/oracle/backup
sdf                                                 8:80   0  256G  0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup   253:18   0  1.2T  0 lvm  /sso/data/oracle/backup
sdg                                                 8:96   0  256G  0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup   253:18   0  1.2T  0 lvm  /sso/data/oracle/backup
sdh                                                 8:112  0  256G  0 disk
+-vg_sso_data_oracle_backup-ssodataoraclebackup   253:18   0  1.2T  0 lvm  /sso/data/oracle/backup
sdi                                                 8:128  0   64G  0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01   253:19   0  623G  0 lvm  /sso/data/oracle/data01
sdj                                                 8:144  0   64G  0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01   253:19   0  623G  0 lvm  /sso/data/oracle/data01
sdk                                                 8:160  0   64G  0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01   253:19   0  623G  0 lvm  /sso/data/oracle/data01
sdl                                                 8:176  0   64G  0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01   253:19   0  623G  0 lvm  /sso/data/oracle/data01
sdm                                                 8:192  0   64G  0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01   253:19   0  623G  0 lvm  /sso/data/oracle/data01
sdn                                                 8:208  0   64G  0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01   253:19   0  623G  0 lvm  /sso/data/oracle/data01
sdo                                                 8:224  0   64G  0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01   253:19   0  623G  0 lvm  /sso/data/oracle/data01
sdp                                                 8:240  0   64G  0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01   253:19   0  623G  0 lvm  /sso/data/oracle/data01
sdq                                                65:0    0   64G  0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01   253:19   0  623G  0 lvm  /sso/data/oracle/data01
sdr                                                65:16   0   64G  0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01   253:19   0  623G  0 lvm  /sso/data/oracle/data01
sds                                                65:32   0   64G  0 disk
+-vg_sso_data_oracle_data01-ssodataoracledata01   253:19   0  623G  0 lvm  /sso/data/oracle/data01
sdt                                                65:48   0  128G  0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17   0  748G  0 lvm  /sso/data/oracle/flash01
sdu                                                65:64   0  128G  0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17   0  748G  0 lvm  /sso/data/oracle/flash01
sdv                                                65:80   0  128G  0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17   0  748G  0 lvm  /sso/data/oracle/flash01
sdw                                                65:96   0  128G  0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17   0  748G  0 lvm  /sso/data/oracle/flash01
sdx                                                65:112  0  128G  0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17   0  748G  0 lvm  /sso/data/oracle/flash01
sdy                                                65:128  0  128G  0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17   0  748G  0 lvm  /sso/data/oracle/flash01
sdz                                                65:144  0  128G  0 disk
+-vg_sso_data_oracle_flash01-ssodataoracleflash01 253:17   0  748G  0 lvm  /sso/data/oracle/flash01
sdaa                                               65:160  0   64G  0 disk
+-vg_sso_sfw_oracle-ssosfworacle                  253:16   0   50G  0 lvm  /sso/sfw/oracle
sdab                                               65:176  0  256G  0 disk
+-vg_standard-opt                                 253:5    0   10G  0 lvm  /opt
+-vg_standard-tmp                                 253:6    0   10G  0 lvm  /tmp
+-vg_standard-var                                 253:7    0   20G  0 lvm  /var
+-vg_standard-sso                                 253:8    0   50G  0 lvm  /sso
+-vg_standard-home                                253:9    0   10G  0 lvm  /home
+-vg_standard-vartmp                              253:10   0    2G  0 lvm  /var/tmp
+-vg_standard-varlog                              253:11   0   50G  0 lvm  /var/log
+-vg_standard-varcache                            253:12   0   10G  0 lvm  /var/cache
+-vg_standard-varlogaudit                         253:13   0   10G  0 lvm  /var/log/audit
+-vg_standard-ssomonitoring                       253:14   0   20G  0 lvm  /sso/monitoring
+-vg_standard-varlogjournal                       253:15   0   10G  0 lvm  /var/log/journal
sdac                                               65:192  0   64G  0 disk
+-sdac1                                            65:193  0  500M  0 part /boot
+-sdac2                                            65:194  0   63G  0 part
¦ +-rootvg-tmplv                                  253:0    0    2G  0 lvm
¦ +-rootvg-usrlv                                  253:1    0   10G  0 lvm  /usr
¦ +-rootvg-homelv                                 253:2    0    1G  0 lvm
¦ +-rootvg-varlv                                  253:3    0    8G  0 lvm
¦ +-rootvg-rootlv                                 253:4    0   42G  0 lvm  /
+-sdac14                                           65:206  0    4M  0 part
+-sdac15                                           65:207  0  495M  0 part /boot/efi

If we look at the relevant portion of azurelinuxagent/daemon/resourcedisk/default.py:

    95      def mount_resource_disk(self, mount_point):
    96          device = self.osutil.device_for_ide_port(1)
    97          if device is None:
    98              raise ResourceDiskError("unable to detect disk topology")
    99
   100          device = "/dev/{0}".format(device)
   101          partition = device + "1"
   102          mount_list = shellutil.run_get_output("mount")[1]
   103          existing = self.osutil.get_mount_point(mount_list, device)
   104
   105          if existing:
   106              logger.info("Resource disk [{0}] is already mounted [{1}]",
   107                          partition,
   108                          existing)
   109              return existing
   110

And the relevant portion of azurelinuxagent/common/osutil/default.py:

  1111      def get_mount_point(self, mountlist, device):
  1112          """
  1113          Example of mountlist:
  1114              /dev/sda1 on / type ext4 (rw)
  1115              proc on /proc type proc (rw)
  1116              sysfs on /sys type sysfs (rw)
  1117              devpts on /dev/pts type devpts (rw,gid=5,mode=620)
  1118              tmpfs on /dev/shm type tmpfs
  1119              (rw,rootcontext="system_u:object_r:tmpfs_t:s0")
  1120              none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
  1121              /dev/sdb1 on /mnt/resource type ext4 (rw)
  1122          """
  1123          if (mountlist and device):
  1124              for entry in mountlist.split('\n'):
  1125                  if(re.search(device, entry)):
  1126                      tokens = entry.split()
  1127                      #Return the 3rd column of this line
  1128                      return tokens[2] if len(tokens) > 2 else None
  1129          return None

We can trace the problem:

rowagn@kmb14au:/usr/lib/python3.6/site-packages#> python
Python 3.6.8 (default, Mar 18 2021, 08:58:41)
[GCC 8.4.1 20200928 (Red Hat 8.4.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import re
>>> import stat
>>> import sys
>>> import threading
>>> from time import sleep
>>> import azurelinuxagent.common.logger as logger
>>> from azurelinuxagent.common.future import ustr
>>> import azurelinuxagent.common.conf as conf
>>> from azurelinuxagent.common.event import add_event, WALAEventOperation
>>> import azurelinuxagent.common.utils.fileutil as fileutil
>>> import azurelinuxagent.common.utils.shellutil as shellutil
>>> from azurelinuxagent.common.exception import ResourceDiskError
>>> from azurelinuxagent.common.osutil import get_osutil
>>> from azurelinuxagent.common.version import AGENT_NAME
>>>
# step through lines 96-102 of mount_resource_disk() just to see the results of the calls
>>> osutil = get_osutil()
>>> device = osutil.device_for_ide_port(1)
>>> device
'sda'
>>> device = "/dev/{0}".format(device)
>>> device
'/dev/sda'
>>> partition = device + "1"
>>> partition
'/dev/sda1'
>>> mount_list = shellutil.run_get_output("mount")[1]
>>> mount_list
'sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)\nproc on /proc type proc (rw,nosuid,nodev,noexec,relatime)\ndevtmpfs on /dev type devtmpfs (rw,nosuid,size=28731092k,nr_inodes=7182773,mode=755)\nsecurityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)\ntmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,noexec)\ndevpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)\ntmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)\ntmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)\ncgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)\npstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)\nbpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)\ncgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)\ncgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)\ncgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)\ncgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)\ncgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)\ncgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)\ncgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)\ncgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)\ncgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)\ncgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)\ncgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)\nnone on /sys/kernel/tracing type tracefs (rw,relatime)\nconfigfs on /sys/kernel/config type configfs (rw,relatime)\n/dev/mapper/rootvg-rootlv on / type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/rootvg-usrlv on /usr type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\nmqueue on /dev/mqueue type mqueue (rw,relatime)\nsystemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=38,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=26069)\nhugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)\ndebugfs on /sys/kernel/debug type debugfs (rw,relatime)\nbinfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)\n/dev/sdac1 on /boot type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/sdac15 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)\n/dev/mapper/vg_standard-home on /home type xfs (rw,nodev,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-opt on /opt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-sso on /sso type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-tmp on /tmp type xfs (rw,nosuid,nodev,noexec,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_sso_sfw_oracle-ssosfworacle on /sso/sfw/oracle type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_sso_data_oracle_backup-ssodataoraclebackup on /sso/data/oracle/backup type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=256,swidth=1792,noquota)\n/dev/mapper/vg_sso_data_oracle_flash01-ssodataoracleflash01 on /sso/data/oracle/flash01 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=256,swidth=1792,noquota)\n/dev/mapper/vg_standard-var on /var type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_sso_data_oracle_data01-ssodataoracledata01 on /sso/data/oracle/data01 type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=256,swidth=2816,noquota)\n/dev/mapper/vg_standard-varcache on /var/cache type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-vartmp on /var/tmp type xfs (rw,nosuid,nodev,noexec,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-varlog on /var/log type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-ssomonitoring on /sso/monitoring type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-varlogaudit on /var/log/audit type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\n/dev/mapper/vg_standard-varlogjournal on /var/log/journal type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)\nsunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)\ntmpfs on /run/user/18448 type tmpfs (rw,nosuid,nodev,relatime,size=5750048k,mode=700,uid=18448,gid=10029)\ntracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)\ntmpfs on /run/user/83249 type tmpfs (rw,nosuid,nodev,relatime,size=5750048k,mode=700,uid=83249,gid=18251)\ntmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=5750048k,mode=700)\ntmpfs on /run/user/47006 type tmpfs (rw,nosuid,nodev,relatime,size=5750048k,mode=700,uid=47006,gid=100)\n'
# now define get_mount_point() so we can trace line 103:
>>> def get_mount_point(mountlist, device):
...     if (mountlist and device):
...         for entry in mountlist.split('\n'):
...             if(re.search(device, entry)):
...                 tokens = entry.split()
...                 return tokens[2] if len(tokens) > 2 else None
...     return None
...
# Now, call line 103 and note that we get back /boot
>>> existing = get_mount_point(mount_list, device)
>>> existing
'/boot'
# Now, redefine get_mount_point() with a more specific regex:
>>> def get_mount_point(mountlist, device):
...     if (mountlist and device):
...         for entry in mountlist.split('\n'):
...             if(re.search(device + '[0-9 ]', entry)):
...                 tokens = entry.split()
...                 return tokens[2] if len(tokens) > 2 else None
...     return None
...
# And note that None was returned with this new regex:
>>> existing = get_mount_point(mount_list, device)
>>> existing
>>> type(existing)
<class 'NoneType'>
>>>

After making the above change, restarting waagent yields a better waagent.log:

2022-06-28T13:15:06.767313Z INFO Daemon Daemon Azure Linux Agent Version:2.2.49.2
2022-06-28T13:15:06.767847Z INFO Daemon Daemon OS: redhat 8.3
2022-06-28T13:15:06.770140Z INFO Daemon Daemon Python: 3.6.8
2022-06-28T13:15:06.770433Z INFO Daemon Daemon Run daemon
2022-06-28T13:15:06.771324Z INFO Daemon Daemon No RDMA handler exists for distro='Red Hat Enterprise Linux' version='8.3'
2022-06-28T13:15:06.795482Z INFO Daemon Daemon Error getting cloud-init enabled status from systemctl: Command '['systemctl', 'is-enabled', 'cloud-init-local.service']' returned non-zero exit status 1.
2022-06-28T13:15:06.847602Z INFO Daemon Daemon Error getting cloud-init enabled status from service: Command '['service', 'cloud-init', 'status']' returned non-zero exit status 3.
2022-06-28T13:15:06.848149Z INFO Daemon Daemon cloud-init is enabled: False
2022-06-28T13:15:06.850281Z INFO Daemon Daemon Using waagent for provisioning
2022-06-28T13:15:06.851542Z INFO Daemon Daemon Activate resource disk
2022-06-28T13:15:06.852096Z INFO Daemon Daemon Searching gen1 prefix 00000000-0001 or gen2 f8b3781a-1e82-4818-a1c3-63d806ec15bb
2022-06-28T13:15:06.854842Z INFO Daemon Daemon Found device: sda
2022-06-28T13:15:07.218629Z INFO Daemon Daemon Examining partition table
2022-06-28T13:15:07.234279Z INFO Daemon Daemon GPT not detected, determining filesystem
2022-06-28T13:15:07.241598Z INFO Daemon Daemon sfdisk --part-type -f /dev/sda 1 -n succeeded
2022-06-28T13:15:07.243245Z INFO Daemon Daemon The partition type is 83
2022-06-28T13:15:07.244988Z INFO Daemon Daemon Mount resource disk [mount -t ext4 /dev/sda1 /mnt/resource]
2022-06-28T13:15:07.357079Z INFO Daemon Daemon Resource disk /dev/sda is mounted at /mnt/resource with ext4
2022-06-28T13:15:07.358579Z INFO Daemon Daemon Enable swap
2022-06-28T13:15:07.925162Z INFO Daemon Daemon Enabled 16896000KB of swap at /mnt/resource/swapfile

Distro and WALinuxAgent details (please complete the following information):

rowagn@kmb14au:~#> uname -a
Linux kmb14au.vsp.sas.com 4.18.0-240.22.1.el8_3.x86_64 #1 SMP Thu Mar 25 14:36:04 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux
rowagn@kmb14au:~#> cat /etc/redhat-release
Red Hat Enterprise Linux release 8.3 (Ootpa)
rowagn@kmb14au:~#> waagent --version
WALinuxAgent-2.2.49.2 running on redhat 8.3
Python: 3.6.8
Goal state agent: 2.7.3.0
rowagn@kmb14au:~#>

Additional context I believe a necessary prerequisite for the problem to occur is the Azure VM needs to have more than 26 disks attached to it, so that the /dev/sd* device names roll over to /dev/sdaX (where X is a letter). Until that rollover occurs, a regex looking for /dev/sda will not match something inappropriately. In sum, the existing regex matches on any device name with a PREFIX of /dev/sda, which can only be incorrect once more than 26 drives are attached.

Log file attached I provided the relevant portions of the log, above. That said, if having the entire log is helpful, I'm happy to provide it.

narrieta commented 2 years ago

@anhvoms could you take a look?

anhvoms commented 1 year ago

@rwagnergit RHEL 8.1+ and RHEL 7.7+ on Azure should be using cloud-init for provisioning and the resource disk formatting/partitioning should be handled by cloud-init. Cloud-init does a better job at discovering the resource disk by looking at the alias /dev/disk/cloud/resource instead.

rwagnergit commented 1 year ago

Is waagent being deprecated? If not, you should fix it. I gift wrapped the solution

Rob

On Nov 28, 2022, at 4:31 PM, Anh Vo @.***> wrote:

 @rwagnergit RHEL 8.1+ and RHEL 7.7+ on Azure should be using cloud-init for provisioning and the resource disk formatting/partitioning should be handled by cloud-init. Cloud-init does a better job at discovering the resource disk by looking at the alias /dev/disk/cloud/resource instead.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

anhvoms commented 1 year ago

@rwagnergit walinuxagent provisioning is not deprecated. It is, however, considered to be in maintenance mode (we will only release patches for security related bugs)