canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.85k stars 854 forks source link

default swap disk setup lead to OS inconsistent disk order #5528

Open jichenjc opened 1 month ago

jichenjc commented 1 month ago

Bug report

with default cloud-init installation and default cloud settings , if I create a VM through openstack with swap disk (e.g 1G) and using the image here I got one root and 1 swap disk ,they can be /dev/vda , /dev/vdb

now, with default cloud-init the /etc/fstab is created like

[root@rh90-kvm000 instance]# cat /etc/fstab
UUID=4a6b9e92-c860-464a-bb7b-446ce819f5cf       /boot   xfs     defaults        0       0
UUID=2bb1d696-b949-43c2-adab-a6fc8c4cfd9b       /       xfs     defaults        0       0
/dev/vdb        none    swap    sw,comment=cloudconfig  0       0                           <===========here it's using /dev/vdb 
[root@rh90-kvm000 instance]# blkid
/dev/sr0: BLOCK_SIZE="2048" UUID="2024-07-20-11-29-30-00" LABEL="config-2" TYPE="iso9660"
/dev/vdb: UUID="367fa101-f577-4b9c-961e-40d34a4ea56b" TYPE="swap"
/dev/vda2: UUID="2bb1d696-b949-43c2-adab-a6fc8c4cfd9b" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="14fc63d2-02"
/dev/vda1: LABEL="boot" UUID="4a6b9e92-c860-464a-bb7b-446ce819f5cf" BLOCK_SIZE="512" TYPE="xfs" PARTUUID="14fc63d2-01"

my expectation is the /etc/fstab need to be following (replace the /dev/vdb with the UUID=) ,so over reboot even the /dev/vdb might be chagned to /dev/vdc due to additional disk is added, it won't affect the swap disk function , otherwise, from libvirt that the /dev/vdx disk order are not guaranteed, the swap disk might point to other data disks..

[root@rh90-kvm000 ~]# cat /etc/fstab
UUID=4a6b9e92-c860-464a-bb7b-446ce819f5cf       /boot   xfs     defaults        0       0
UUID=2bb1d696-b949-43c2-adab-a6fc8c4cfd9b       /       xfs     defaults        0       0
UUID=367fa101-f577-4b9c-961e-40d34a4ea56b       none    swap    sw,comment=cloudconfig  0       0

some info from libvirt

"target dev=vdc" will try to give the device that has the serial and ccw path mentioned in the XML the logical name vdc when the system boots up. However, as mentioned, this is not a guarantee, this just tries to give an ordering hint to the guest OS, but again it will depend on the order in which the guest OS actually detects the devices and that is unreliable as explained in the previous update I made. Here is the documentation I'm referencing for your reference:

https://libvirt.org/formatdomain.html#virtio-related-options

some detailed info

# cloud-init -v
/usr/bin/cloud-init 21.1-19.el9
# cat /etc/cloud/cloud.cfg
users:
 - default

disable_root: false
ssh_pwauth:   1

mount_default_fields: [~, ~, 'auto', 'defaults,nofail,x-systemd.requires=cloud-init.service,_netdev', '0', '2']
resize_rootfs_tmp: /dev
ssh_deletekeys:   1
ssh_genkeytypes:  ['rsa', 'ecdsa', 'ed25519']
syslog_fix_perms: ~
disable_vmware_customization: false

cloud_init_modules:
 - disk_setup
 - migrator
 - bootcmd
 - write-files
 - growpart
 - resizefs
 - set_hostname
 - update_hostname
 - update_etc_hosts
 - rsyslog
 - users-groups
 - ssh

cloud_config_modules:
 - mounts
 - locale
 - set-passwords
 - rh_subscription
 - yum-add-repo
 - package-update-upgrade-install
 - timezone
 - puppet
 - chef
 - salt-minion
 - mcollective
 - disable-ec2-metadata
 - runcmd

cloud_final_modules:
 - rightscale_userdata
 - scripts-per-once
 - scripts-per-boot
 - scripts-per-instance
 - scripts-user
 - ssh-authkey-fingerprints
 - keys-to-console
 - phone-home
 - final-message
 - power-state-change

system_info:
  default_user:
    name: cloud-user
    lock_passwd: true
    gecos: Cloud User
    groups: [adm, systemd-journal]
    sudo: ["ALL=(ALL) NOPASSWD:ALL"]
    shell: /bin/bash
  distro: rhel
  paths:
    cloud_dir: /var/lib/cloud
    templates_dir: /etc/cloud/templates
  ssh_svcname: sshd

Steps to reproduce the problem

Environment details

cloud-init logs

holmanb commented 1 month ago

@jichenjc Thanks for reporting!

Can you please provide the logs and user-data that came from openstack?

jichenjc commented 1 month ago

will openstack logs helpful here ? as when the VM is created, then openstack not involved anymore I will request the user data from my colleagues ,

some example of the issue with more detail originally from /etc/fstab during cloud-init setup , /dev/vdb is selected as swap disk and it's setup to system , so everything is fine

/dev/vdb        none    swap    sw,comment=cloudconfig  0       0      <----THIS IS THE MAPPING AT DEPLOY TIME
/dev/datavg/datalv /data xfs defaults 0 0

then after hard reboot from openstack , which will undefine the xml and create the virsh xml from openstack db due to the reason I pasted above, the original disk become vdc which makes /etc/fstab definition incorrect anymore

#lsblk
...
└─vda2                     252:2    0  512M  0 part /boot
vdb                        252:16   0  100G  0 disk
└─vdb1                     252:17   0  100G  0 part
  └─datavg-datalv          253:0    0  100G  0 lvm  /data
vdc                        252:32   0   32G  0 disk  <------------ THIS SHOULD BE THE SWAP DISK
jichenjc commented 1 month ago
[root@rh90-kvm000 404dde05-7b5c-4df3-84b4-7cbffd45e42a]# pwd
/var/lib/cloud/instances/404dde05-7b5c-4df3-84b4-7cbffd45e42a
[root@rh90-kvm000 404dde05-7b5c-4df3-84b4-7cbffd45e42a]# ls -lh
total 36K
-rw-r--r--. 1 root root   57 Jul 20 00:34 boot-finished
-rw-------. 1 root root    0 Jul 20 00:34 cloud-config.txt
-rw-r--r--. 1 root root   74 Jul 20 00:34 datasource
drwxr-xr-x. 2 root root    6 Jul 20 00:28 handlers
-r--------. 1 root root 8.7K Jul 20 00:34 obj.pkl
drwxr-xr-x. 2 root root    6 Jul 20 00:28 scripts
drwxr-xr-x. 2 root root 4.0K Jul 20 00:28 sem
-rw-------. 1 root root    0 Jul 20 00:34 user-data.txt
-rw-------. 1 root root  308 Jul 20 00:34 user-data.txt.i
-rw-------. 1 root root    0 Jul 20 00:34 vendor-data.txt
-rw-------. 1 root root  308 Jul 20 00:34 vendor-data.txt.i
-rw-------. 1 root root    0 Jul 20 00:34 vendor-data2.txt
-rw-------. 1 root root  308 Jul 20 00:34 vendor-data2.txt.i
[root@rh90-kvm000 404dde05-7b5c-4df3-84b4-7cbffd45e42a]#

is this the user data you are looking for ?it's 0 so no user input from openstack side ..

Dr-Shadow commented 1 month ago

@jichenjc maybe you could use another path to identify the device as a workaround ? (for example I use /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part1) Of course this would be a temporary fix until cloud-init has proper support for UUID on disks

jichenjc commented 1 month ago

ok, thanks for valuable feedback, I will check what I can do here to bypass ..

this might work for SCSI disks, but for swap I doubt it's doable... will do more checks here, thanks