apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
1.85k stars 1.07k forks source link

Data Disks get unavailable when VM is shut down #7490

Open VincentHermes opened 1 year ago

VincentHermes commented 1 year ago
ISSUE TYPE
Bug Report
COMPONENT NAME
Disk Controller
CLOUDSTACK VERSION
4.16
OS / ENVIRONMENT
KVM, Windows, SCSI rootDiskController

SUMMARY

Adding more than 6 Disks in a VM results in a second SCSI controller being created. The type of the controller varies whether the disk is attached while the VM is running or the VM is started while having more than 6 disks. If disks are added on the fly, everything works fine. If the VM is stopped and started while already having more than 6 disks, the second controller being added is of a type that breaks Windows 2022 (and others I think, still testing around).

STEPS TO REPRODUCE

Normal Disk Setting in XML
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/storpool-byid/nbmn.b.xxxx' index='0'/>
      <backingStore/>
      <target dev='sda' bus='scsi'/>
      <serial>abcdefghijklmnop42</serial>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>

! Note the alias name in this config Every other disk until the 6th will be configured the same way, the alias name iterates to "scsi0-0-0-5"

7th Disk Setting in XML if attached live, VM not being stopped
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/storpool-byid/nbmn.b.xxxx' index='7'/>
      <backingStore/>
      <target dev='sdg' bus='scsi'/>
      <serial>abcdefghijklmnop42</serial>
      <alias name='scsi1-0-0-0'/>
      <address type='drive' controller='1' bus='0' target='0' unit='0'/>

! Note the alias name in this config, its now "scsi1-0-0-0" which is okay as it has three zeroes for some reason and in the OS it is recognized as a "RedHat Virtio SCSI controller". All disks work correctly in the OS this way.

7th Disk Setting in XML if the VM has been stopped and then started again (XML gets recreated)
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/storpool-byid/nbmn.b.xxxx' index='7'/>
      <backingStore/>
      <target dev='sdg' bus='scsi'/>
      <serial>abcdefghijklmnop42</serial>
      <alias name='scsi1-0-0'/>
      <address type='drive' controller='1' bus='0' target='0' unit='0'/>

! Note the alias name in this config, its now "scsi1-0-0" so it is missing a zero and also it becomes a different type of controller. The RedHat driver no longer works. The only driver able to be installed for this device is the VMWare PVSCSI driver, which still renders the attached disks unavailable and breaks the Windows Boot (BSOD) even though the root disk is on the other controller. In this case you need to remove every disk until you have only 6 left and start the VM. If you attach a disk again, it will be a new "unknown device" again.

I wonder what happens if the VM has virtio instead of SCSI as rootDiskController. Checking that out.

EXPECTED RESULTS
At least keep the controller type the same
ACTUAL RESULTS
Customers bricking their VMs after being stopped one time because disks are missing.
weizhouapache commented 10 months ago

moved to 4.18.2.0

weizhouapache commented 3 weeks ago

I was not able to reproduce the issue

when add 6 data disk (7 in total including root disk), the xml has the following

# virsh dumpxml i-2-9-VM |grep scsi
      <target dev='sda' bus='scsi'/>
      <alias name='scsi0-0-0-0'/>
      <target dev='sdb' bus='scsi'/>
      <alias name='scsi0-0-0-1'/>
      <target dev='sdc' bus='scsi'/>
      <alias name='scsi0-0-0-2'/>
      <target dev='sde' bus='scsi'/>
      <alias name='scsi0-0-0-4'/>
      <target dev='sdf' bus='scsi'/>
      <alias name='scsi0-0-0-5'/>
      <target dev='sdg' bus='scsi'/>
      <alias name='scsi0-0-0-6'/>
      <target dev='sdh' bus='scsi'/>
      <alias name='scsi1-0-0-0'/>

when stop/start the vm, the output of the same (the index of data disks are different).

I tested on rocky8, below is the host information

# virsh version
Compiled against library: libvirt 8.0.0
Using library: libvirt 8.0.0
Using API: QEMU 8.0.0
Running hypervisor: QEMU 6.2.0

# cat /etc/os-release 
NAME="Rocky Linux"
VERSION="8.4 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel fedora"
VERSION_ID="8.4"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.4 (Green Obsidian)"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:rocky:rocky:8.4:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
ROCKY_SUPPORT_PRODUCT="Rocky Linux"
ROCKY_SUPPORT_PRODUCT_VERSION="8"

@VincentHermes can you share the information of your kvm host ?