apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.09k stars 1.11k forks source link

Wrong mountpoint generated on secondary storage for ISO on live migration #8901

Closed loth closed 5 months ago

loth commented 7 months ago
ISSUE TYPE
COMPONENT NAME
Secondary Storage
CLOUDSTACK VERSION
4.18.1
CONFIGURATION

KVM using 2x secondary storage and many primary storage mounts

OS / ENVIRONMENT

Ubuntu 22.04

SUMMARY

When attempting to live migrate a VM Cloudstack throws an error from libvirt with 'Invocation exception, caused by: com.cloud.utils.exception.CloudRuntimeException: Exception during migrate: org.libvirt.LibvirtException: Cannot access storage file '/mnt/667dc116-bc57-3e1f-b21a-96babdbcebd4/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso': No such file or directory'

This file exists on the secondary storage, and the agent successfully creates a new storage pool using the ISO, however it generates it under a new directory structure.

STEPS TO REPRODUCE

This seems to only happen with long-standing VM's, newly VM's seem to migrate fine. It's possible this only occurs when a storage pool needs to be created.

Here is the excerpt from the migration job from the management server;

2024-04-11 19:46:29,547 DEBUG [o.a.c.e.o.VolumeOrchestrator] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Preparing to migrate [1] volumes for VM [VM instance {"id":25995,"instanceName":"i-2-25995-VM","type":"User","uuid":"ed56cab5-6657-407c-8def-0a7b0a50b568"}].
2024-04-11 19:46:29,629 DEBUG [o.a.c.s.i.TemplateDataFactoryImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) template 2695 is already in store:6, type:Image
2024-04-11 19:46:29,682 DEBUG [c.c.a.m.ClusteredAgentAttache] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Seq 343-7617275819743874648: Forwarding Seq 343-7617275819743874648:  { Cmd , MgmtId: 95544004472, via: 343(node181), Ver: v1, Flags: 100111, [{"com.cloud.agent.api.PrepareForMigrationCommand":{"vm":{"id":"25995","name":"i-2-25995-VM","state":"Running","type":"User","cpus":"2","minSpeed":"2000","maxSpeed":"2000","minRam":"(4.00 GB) 4294967296","maxRam":"(4.00 GB) 4294967296","arch":"x86_64","os":"Other PV Virtio-SCSI (64-bit)","platformEmulator":"Other PV Virtio-SCSI (64-bit)","bootArgs":"","enableHA":"true","limitCpuUse":"false","enableDynamicallyScaleVm":"false","params":{"kvm.vnc.port":"5903","rootdisksize":"5","kvm.vnc.address":"172.30.34.101","Message.ReservedCapacityFreed.Flag":"false"},"uuid":"ed56cab5-6657-407c-8def-0a7b0a50b568","enterHardwareSetup":"false","disks":[{"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid":"02de7a0f-bd65-4d7c-a56f-d70b233c7c1b","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"a6d82da8-fdc3-38d0-8a3f-2798f270bbf8","name":"Primary2-Netapp-18","id":"59","poolType":"NetworkFilesystem","host":"172.70.1.32","path":"/csrp_primary2_ssd18","port":"2049","url":"NetworkFilesystem://172.70.1.32/csrp_primary2_ssd18/?ROLE=Primary&STOREUUID=a6d82da8-fdc3-38d0-8a3f-2798f270bbf8","isManaged":"false"}},"name":"ROOT-25995","size":"(5.00 GB) 5368709120","path":"02de7a0f-bd65-4d7c-a56f-d70b233c7c1b","volumeId":"41742","vmName":"i-2-25995-VM","accountId":"2","format":"QCOW2","provisioningType":"THIN","poolId":"59","id":"41742","deviceId":"0","bytesReadRate":"(0 bytes) 0","bytesWriteRate":"(0 bytes) 0","iopsReadRate":"(0 bytes) 0","iopsWriteRate":"(0 bytes) 0","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"}},"diskSeq":"0","path":"02de7a0f-bd65-4d7c-a56f-d70b233c7c1b","type":"ROOT","_details":{"storageHost":"172.70.1.32","managed":"false","storagePort":"2049","storage.pool.disk.wait":"60","volumeSize":"(5.00 GB) 5368709120"}},{"data":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"template/tmpl/2/2695/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso","origUrl":"http://url/grml64-full_2018.12.iso","uuid":"9159f256-d399-4b0f-8a0f-5cacff1c39f7","id":"2695","format":"ISO","accountId":"2","checksum":"237972093d31e89b35d92ab1dabe06ed","hvm":"true","displayText":"GRML 2018.12 rescue CD","imageDataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://172.70.1.20/csrp_secondary_ssd2","_role":"Image"}},"name":"2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb","guestOsType":"Linux/BSD 64-bit","size":"(672.25 MB) 704905216","hypervisorType":"None","bootable":"false","uniqueName":"2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb","directDownload":"false","deployAsIs":"false"}},"diskSeq":"3","type":"ISO"}],"nics":[{"deviceId":"0","networkRateMbps":"1000","defaultNic":"true","pxeDisable":"false","nicUuid":"96572d76-6eb8-47ae-a98c-ef3d9920fefc","details":{"MacAddressChanges":"true","ForgedTransmits":"true","PromiscuousMode":"false","MacLearning":"false"},"dpdkEnabled":"false","uuid":"5942a519-a626-4f49-8e58-d6cbf9052131","ip":"10.1.1.81","netmask":"255.255.255.0","gateway":"10.1.1.1","mac":"02:00:45:06:00:18","dns1":"23.19.52.52","dns2":"23.19.53.53","broadcastType":"Vlan","type":"Guest","broadcastUri":"vlan://1966","isolationUri":"vlan://1966","isSecurityGroupEnabled":"false","name":"cloudbr0","ip6Dns1":"","ip6Dns2":""}],"vcpuMaxLimit":"2","configDriveLabel":"config","configDriveLocation":"SECONDARY","guestOsDetails":{},"extraConfig":{}},"rollback":"false","wait":"0","bypassHostMaintenance":"false"}}] } to 95537038070
2024-04-11 19:46:30,164 DEBUG [c.c.a.t.Request] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Seq 343-7617275819743874648: Received:  { Ans: , MgmtId: 95544004472, via: 343(node181), Ver: v1, Flags: 110, { PrepareForMigrationAnswer } }
2024-04-11 19:46:30,185 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) VM instance {"id":25995,"instanceName":"i-2-25995-VM","type":"User","uuid":"ed56cab5-6657-407c-8def-0a7b0a50b568"} state transited from [Running] to [Migrating] with event [MigrationRequested]. VM's original host: Host {"id":488,"name":"node108","type":"Routing","uuid":"01132381-6797-4bb1-8aab-e9e990029c64"}, new host: Host {"id":343,"name":"node181","type":"Routing","uuid":"9def292d-3a1d-4fe2-b9ac-d4730944c59c"}, host before state transition: Host {"id":488,"name":"node108","type":"Routing","uuid":"01132381-6797-4bb1-8aab-e9e990029c64"}
2024-04-11 19:46:30,200 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Hosts's actual total CPU: 105600 and CPU after applying overprovisioning: 105600
2024-04-11 19:46:30,200 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) We are allocating VM, increasing the used capacity of this host:343
2024-04-11 19:46:30,200 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Current Used CPU: 70000 , Free CPU:35600 ,Requested CPU: 4000
2024-04-11 19:46:30,200 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Current Used RAM: (102.00 GB) 109521666048 , Free RAM:(13.78 GB) 14799896576 ,Requested RAM: (4.00 GB) 4294967296
2024-04-11 19:46:30,200 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) CPU STATS after allocation: for host: 343, old used: 70000, old reserved: 0, actual total: 105600, total with overprovisioning: 105600; new used:74000, reserved:0; requested cpu:4000,alloc_from_last:false
2024-04-11 19:46:30,201 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) RAM STATS after allocation: for host: 343, old used: (102.00 GB) 109521666048, old reserved: (0 bytes) 0, total: (115.78 GB) 124321562624; new used: (106.00 GB) 113816633344, reserved: (0 bytes) 0; requested mem: (4.00 GB) 4294967296,alloc_from_last:false
2024-04-11 19:46:30,204 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Host: 343 has cpu capability (cpu:48, speed:2200) to support requested CPU: 2 and requested speed: 2000
2024-04-11 19:46:30,204 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Checking if host: 343 has enough capacity for requested CPU: 4000 and requested RAM: (4.00 GB) 4294967296 , cpuOverprovisioningFactor: 1.0
2024-04-11 19:46:30,207 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Hosts's actual total CPU: 105600 and CPU after applying overprovisioning: 105600
2024-04-11 19:46:30,207 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) We need to allocate to the last host again, so checking if there is enough reserved capacity
2024-04-11 19:46:30,207 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Reserved CPU: 0 , Requested CPU: 4000
2024-04-11 19:46:30,207 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Reserved RAM: (0 bytes) 0 , Requested RAM: (4.00 GB) 4294967296
2024-04-11 19:46:30,207 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) STATS: Failed to alloc resource from host: 343 reservedCpu: 0, requested cpu: 4000, reservedMem: (0 bytes) 0, requested mem: (4.00 GB) 4294967296
2024-04-11 19:46:30,207 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Host does not have enough reserved CPU available, cannot allocate to this host.
2024-04-11 19:46:30,207 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Checking if host: 343 has enough capacity for requested CPU: 4000 and requested RAM: (4.00 GB) 4294967296 , cpuOverprovisioningFactor: 1.0
2024-04-11 19:46:30,209 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Hosts's actual total CPU: 105600 and CPU after applying overprovisioning: 105600
2024-04-11 19:46:30,209 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Free CPU: 35600 , Requested CPU: 4000
2024-04-11 19:46:30,209 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Free RAM: (13.78 GB) 14799896576 , Requested RAM: (4.00 GB) 4294967296
2024-04-11 19:46:30,209 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Host has enough CPU and RAM available
2024-04-11 19:46:30,209 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) STATS: Can alloc CPU from host: 343, used: 70000, reserved: 0, actual total: 105600, total with overprovisioning: 105600; requested cpu:4000,alloc_from_last_host?:false ,considerReservedCapacity?: true
2024-04-11 19:46:30,209 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) STATS: Can alloc MEM from host: 343, used: (102.00 GB) 109521666048, reserved: (0 bytes) 0, total: (115.78 GB) 124321562624; requested mem: (4.00 GB) 4294967296, alloc_from_last_host?: false , considerReservedCapacity?: true
2024-04-11 19:46:30,232 DEBUG [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Setting VLAN persistence to [{"1966":true}] as part of migrate command for VM [VM {id: "25995", name: "i-2-25995-VM", uuid: "ed56cab5-6657-407c-8def-0a7b0a50b568", type: "User"}].
2024-04-11 19:46:30,234 DEBUG [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Setting CPU shares to [379] as part of migrate command for VM [VM {id: "25995", name: "i-2-25995-VM", uuid: "ed56cab5-6657-407c-8def-0a7b0a50b568", type: "User"}].
2024-04-11 19:46:30,248 DEBUG [c.c.a.m.ClusteredAgentAttache] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Seq 488-2345812455906704593: Forwarding Seq 488-2345812455906704593:  { Cmd , MgmtId: 95544004472, via: 488(node108), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.MigrateCommand":{"vmName":"i-2-25995-VM","destIp":"172.30.36.6","migrateStorageManaged":"false","migrateNonSharedInc":"false","autoConvergence":"true","hostGuid":"46a313b7-5b71-3530-b361-f9717f16e75a-LibvirtComputingResource","isWindows":"false","vmTO":{"id":"25995","name":"i-2-25995-VM","state":"Running","type":"User","cpus":"2","minSpeed":"2000","maxSpeed":"2000","minRam":"(4.00 GB) 4294967296","maxRam":"(4.00 GB) 4294967296","arch":"x86_64","os":"Other PV Virtio-SCSI (64-bit)","platformEmulator":"Other PV Virtio-SCSI (64-bit)","bootArgs":"","enableHA":"true","limitCpuUse":"false","enableDynamicallyScaleVm":"false","params":{"kvm.vnc.port":"5903","rootdisksize":"5","kvm.vnc.address":"172.30.34.101","Message.ReservedCapacityFreed.Flag":"false"},"uuid":"ed56cab5-6657-407c-8def-0a7b0a50b568","enterHardwareSetup":"false","disks":[{"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid":"02de7a0f-bd65-4d7c-a56f-d70b233c7c1b","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"a6d82da8-fdc3-38d0-8a3f-2798f270bbf8","name":"Primary2-Netapp-18","id":"59","poolType":"NetworkFilesystem","host":"172.70.1.32","path":"/csrp_primary2_ssd18","port":"2049","url":"NetworkFilesystem://172.70.1.32/csrp_primary2_ssd18/?ROLE=Primary&STOREUUID=a6d82da8-fdc3-38d0-8a3f-2798f270bbf8","isManaged":"false"}},"name":"ROOT-25995","size":"(5.00 GB) 5368709120","path":"02de7a0f-bd65-4d7c-a56f-d70b233c7c1b","volumeId":"41742","vmName":"i-2-25995-VM","accountId":"2","format":"QCOW2","provisioningType":"THIN","poolId":"59","id":"41742","deviceId":"0","bytesReadRate":"(0 bytes) 0","bytesWriteRate":"(0 bytes) 0","iopsReadRate":"(0 bytes) 0","iopsWriteRate":"(0 bytes) 0","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"}},"diskSeq":"0","path":"02de7a0f-bd65-4d7c-a56f-d70b233c7c1b","type":"ROOT","_details":{"storageHost":"172.70.1.32","managed":"false","storagePort":"2049","storage.pool.disk.wait":"60","volumeSize":"(5.00 GB) 5368709120"}},{"data":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"path":"template/tmpl/2/2695/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso","origUrl":"http://url/grml64-full_2018.12.iso","uuid":"9159f256-d399-4b0f-8a0f-5cacff1c39f7","id":"2695","format":"ISO","accountId":"2","checksum":"237972093d31e89b35d92ab1dabe06ed","hvm":"true","displayText":"GRML 2018.12 rescue CD","imageDataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://172.70.1.20/csrp_secondary_ssd2","_role":"Image"}},"name":"2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb","guestOsType":"Linux/BSD 64-bit","size":"(672.25 MB) 704905216","hypervisorType":"None","bootable":"false","uniqueName":"2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb","directDownload":"false","deployAsIs":"false"}},"diskSeq":"3","type":"ISO"}],"nics":[{"deviceId":"0","networkRateMbps":"1000","defaultNic":"true","pxeDisable":"false","nicUuid":"96572d76-6eb8-47ae-a98c-ef3d9920fefc","details":{"MacAddressChanges":"true","ForgedTransmits":"true","PromiscuousMode":"false","MacLearning":"false"},"dpdkEnabled":"false","uuid":"5942a519-a626-4f49-8e58-d6cbf9052131","ip":"10.1.1.81","netmask":"255.255.255.0","gateway":"10.1.1.1","mac":"02:00:45:06:00:18","dns1":"23.19.52.52","dns2":"23.19.53.53","broadcastType":"Vlan","type":"Guest","broadcastUri":"vlan://1966","isolationUri":"vlan://1966","isSecurityGroupEnabled":"false","name":"cloudbr0","ip6Dns1":"","ip6Dns2":""}],"vcpuMaxLimit":"2","configDriveLabel":"config","configDriveLocation":"SECONDARY","guestOsDetails":{},"extraConfig":{}},"executeInSequence":"false","migrateDiskInfoList":[],"dpdkInterfaceMapping":{},"newVmCpuShares":"379","vlanToPersistenceMap":{"1966":"true"},"wait":"0","bypassHostMaintenance":"false"}}] } to 95536130474
2024-04-11 19:46:30,658 DEBUG [c.c.a.t.Request] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Seq 488-2345812455906704593: Received:  { Ans: , MgmtId: 95544004472, via: 488(node108), Ver: v1, Flags: 10, { MigrateAnswer } }
2024-04-11 19:46:30,658 INFO  [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Migration was unsuccessful.  Cleaning up: VM instance {"id":25995,"instanceName":"i-2-25995-VM","type":"User","uuid":"ed56cab5-6657-407c-8def-0a7b0a50b568"}
2024-04-11 19:46:30,680 DEBUG [o.a.c.e.o.VolumeOrchestrator] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Releasing [1] volumes for VM [VM instance {"id":25995,"instanceName":"i-2-25995-VM","type":"User","uuid":"ed56cab5-6657-407c-8def-0a7b0a50b568"}] from host [Host {"id":343,"name":"node181","type":"Routing","uuid":"9def292d-3a1d-4fe2-b9ac-d4730944c59c"}].
2024-04-11 19:46:30,689 WARN  [c.c.a.AlertManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) alertType=[17] dataCenterId=[1] podId=[1] clusterId=[null] message=[Unable to migrate vm i-2-25995-VM from host node108 in zone CSRP01 and pod CSRP01-1].
2024-04-11 19:46:30,710 DEBUG [c.c.a.AlertManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Have already sent: 1 emails for alert type '17' -- skipping send email
2024-04-11 19:46:30,711 DEBUG [c.c.v.VirtualMachineManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) VM instance {"id":25995,"instanceName":"i-2-25995-VM","type":"User","uuid":"ed56cab5-6657-407c-8def-0a7b0a50b568"} is a [User], returning null for control Nic IP.
2024-04-11 19:46:30,730 DEBUG [c.c.a.m.ClusteredAgentAttache] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Seq 343-7617275819743874649: Forwarding Seq 343-7617275819743874649:  { Cmd , MgmtId: 95544004472, via: 343(node181), Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":"false","checkBeforeCleanup":"false","forceStop":"false","vlanToPersistenceMap":{"1966":"true"},"volumesToDisconnect":[],"vmName":"i-2-25995-VM","executeInSequence":"false","wait":"0","bypassHostMaintenance":"false"}}] } to 95537038070
2024-04-11 19:46:30,745 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) VM instance {"id":25995,"instanceName":"i-2-25995-VM","type":"User","uuid":"ed56cab5-6657-407c-8def-0a7b0a50b568"} state transited from [Migrating] to [Running] with event [OperationFailed]. VM's original host: Host {"id":488,"name":"node108","type":"Routing","uuid":"01132381-6797-4bb1-8aab-e9e990029c64"}, new host: Host {"id":488,"name":"node108","type":"Routing","uuid":"01132381-6797-4bb1-8aab-e9e990029c64"}, host before state transition: Host {"id":343,"name":"node181","type":"Routing","uuid":"9def292d-3a1d-4fe2-b9ac-d4730944c59c"}
2024-04-11 19:46:30,758 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Hosts's actual total CPU: 105600 and CPU after applying overprovisioning: 105600
2024-04-11 19:46:30,759 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Hosts's actual total RAM: (115.78 GB) 124321566720 and RAM after applying overprovisioning: (115.78 GB) 124321562624
2024-04-11 19:46:30,759 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) release cpu from host: 343, old used: 74000,reserved: 0, actual total: 105600, total with overprovisioning: 105600; new used: 70000,reserved:0; movedfromreserved: false,moveToReserveredfalse
2024-04-11 19:46:30,759 DEBUG [c.c.c.CapacityManagerImpl] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) release mem from host: 343, old used: (106.00 GB) 113816633344,reserved: (0 bytes) 0, total: (115.78 GB) 124321562624; new used: (102.00 GB) 109521666048,reserved:(0 bytes) 0; movedfromreserved: false,moveToReserveredfalse
2024-04-11 19:46:30,766 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Invocation exception, caused by: com.cloud.utils.exception.CloudRuntimeException: Exception during migrate: org.libvirt.LibvirtException: Cannot access storage file '/mnt/667dc116-bc57-3e1f-b21a-96babdbcebd4/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso': No such file or directory
2024-04-11 19:46:30,766 INFO  [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-151:ctx-6a3ae7c2 job-3041976/job-3041979 ctx-5442e019) (logid:c631290f) Rethrow exception com.cloud.utils.exception.CloudRuntimeException: Exception during migrate: org.libvirt.LibvirtException: Cannot access storage file '/mnt/667dc116-bc57-3e1f-b21a-96babdbcebd4/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso': No such file or directory

on the agent I can see the pool being searched for, then created since it cannot find it and finally a stop command from the management server.

2024-04-11 19:46:29,732 DEBUG [resource.wrapper.LibvirtPrepareForMigrationCommandWrapper] (agentRequest-Handler-2:null) Preparing host for migrating VM {id: "25995", name: "i-2-25995-VM", uuid: "ed56cab5-6657-407c-8def-0a7b0a50b568", type: "User"}
2024-04-11 19:46:29,732 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) Looking for libvirtd connection at: qemu:///system
2024-04-11 19:46:29,751 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) Can not find KVM connection for Instance: i-2-25995-VM, continuing.
2024-04-11 19:46:29,751 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) Looking for libvirtd connection at: lxc:///
2024-04-11 19:46:29,751 INFO  [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) No existing libvirtd connection found. Opening a new one
2024-04-11 19:46:29,752 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) Can not find LXC connection for Instance: i-2-25995-VM, continuing.
2024-04-11 19:46:29,752 WARN  [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) Can not find a connection for Instance i-2-25995-VM. Assuming the default connection.
2024-04-11 19:46:29,752 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) Looking for libvirtd connection at: qemu:///system
2024-04-11 19:46:29,771 DEBUG [kvm.resource.BridgeVifDriver] (agentRequest-Handler-2:null) nic=[Nic:Guest-10.1.1.81-vlan://1966]
2024-04-11 19:46:29,771 DEBUG [kvm.resource.BridgeVifDriver] (agentRequest-Handler-2:null) creating a vNet dev and bridge for guest traffic per traffic label cloudbr0
2024-04-11 19:46:29,771 DEBUG [kvm.resource.BridgeVifDriver] (agentRequest-Handler-2:null) Executing: /usr/share/cloudstack-common/scripts/vm/network/vnet/modifyvlan.sh -v 1966 -p bond0 -b brbond0-1966 -o add
2024-04-11 19:46:29,772 DEBUG [kvm.resource.BridgeVifDriver] (agentRequest-Handler-2:null) Executing while with timeout : 1800000
2024-04-11 19:46:29,808 DEBUG [kvm.resource.BridgeVifDriver] (agentRequest-Handler-2:null) Execution is successful.
2024-04-11 19:46:29,808 DEBUG [resource.wrapper.LibvirtPrepareForMigrationCommandWrapper] (agentRequest-Handler-2:null) disk volumeTO[uuid=02de7a0f-bd65-4d7c-a56f-d70b233c7c1b|path=02de7a0f-bd65-4d7c-a56f-d70b233c7c1b|datastore=PrimaryDataStoreTO[uuid=a6d82da8-fdc3-38d0-8a3f-2798f270bbf8|name=Primary2-Netapp-18|id=59|pooltype=NetworkFilesystem]] has no passphrase or encryption
2024-04-11 19:46:29,809 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) Attempting to create storage pool 49d099eb-456c-3262-867a-3dd38b623a04 (NetworkFilesystem) in libvirt
2024-04-11 19:46:29,809 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) Looking for libvirtd connection at: qemu:///system
2024-04-11 19:46:29,827 WARN  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) Storage pool 49d099eb-456c-3262-867a-3dd38b623a04 was not found running in libvirt. Need to create it.
2024-04-11 19:46:29,827 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) Didn't find an existing storage pool 49d099eb-456c-3262-867a-3dd38b623a04 by UUID, checking for pools with duplicate paths
...
2024-04-11 19:46:29,931 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) Attempting to create storage pool 49d099eb-456c-3262-867a-3dd38b623a04
2024-04-11 19:46:29,931 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) <pool type='netfs'>
2024-04-11 19:46:29,931 DEBUG [utils.script.Script] (agentRequest-Handler-2:null) Executing: /bin/bash -c mountpoint -q /mnt/49d099eb-456c-3262-867a-3dd38b623a04
2024-04-11 19:46:29,932 DEBUG [utils.script.Script] (agentRequest-Handler-2:null) Executing while with timeout : 3600000
2024-04-11 19:46:29,937 DEBUG [utils.script.Script] (agentRequest-Handler-2:null) Exit value is 32
2024-04-11 19:46:29,937 DEBUG [utils.script.Script] (agentRequest-Handler-2:null) 32
2024-04-11 19:46:30,123 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) Trying to fetch storage pool 49d099eb-456c-3262-867a-3dd38b623a04 from libvirt
2024-04-11 19:46:30,123 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) Looking for libvirtd connection at: qemu:///system
2024-04-11 19:46:30,146 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) Successfully refreshed pool 49d099eb-456c-3262-867a-3dd38b623a04 Capacity: (9.0931 TB) 9997967032320 Used: (7.2693 TB) 7992727437312 Available: (1.8238 TB) 2005239595008
2024-04-11 19:46:30,149 INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) Trying to fetch storage pool a6d82da8-fdc3-38d0-8a3f-2798f270bbf8 from libvirt
2024-04-11 19:46:30,149 DEBUG [kvm.resource.LibvirtConnection] (agentRequest-Handler-2:null) Looking for libvirtd connection at: qemu:///system
2024-04-11 19:46:30,151 DEBUG [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) Successfully refreshed pool a6d82da8-fdc3-38d0-8a3f-2798f270bbf8 Capacity: (5.0000 TB) 5497558138880 Used: (2.0016 TB) 2200754585600 Available: (2.9984 TB) 3296803553280
2024-04-11 19:46:30,151 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-2:null) This host utilizes cgroupv2 (as the max shares value is [105600]), thus, the VM requested shares of [4000] will be converted to consider the host limits; the new CPU shares value is [379].
2024-04-11 19:46:30,151 DEBUG [resource.wrapper.LibvirtPrepareForMigrationCommandWrapper] (agentRequest-Handler-2:null) Setting CPU shares to [379] for the migration of VM [VM {id: "25995", name: "i-2-25995-VM", uuid: "ed56cab5-6657-407c-8def-0a7b0a50b568", type: "User"}].
2024-04-11 19:46:30,152 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) Seq 343-7617275819743874648:  { Ans: , MgmtId: 95544004472, via: 343, Ver: v1, Flags: 110, [{"com.cloud.agent.api.PrepareForMigrationAnswer":{"dpdkInterfaceMapping":{},"newVmCpuShares":"379","result":"true","wait":"0","bypassHostMaintenance":"false"}}] }
2024-04-11 19:46:30,721 DEBUG [cloud.agent.Agent] (agentRequest-Handler-1:null) Request:Seq 343-7617275819743874649:  { Cmd , MgmtId: 95544004472, via: 343, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.StopCommand":{"isProxy":"false","checkBeforeCleanup":"false","forceStop":"false","vlanToPersistenceMap":{"1966":"true"},"volumesToDisconnect":[],"vmName":"i-2-25995-VM","executeInSequence":"false","wait":"0","bypassHostMaintenance":"false"}}] }
2024-04-11 19:46:30,721 DEBUG [cloud.agent.Agent] (agentRequest-Handler-1:null) Processing command: com.cloud.agent.api.StopCommand

It seems that Cloudstack created '/mnt/49d099eb-456c-3262-867a-3dd38b623a04' however when libvirt migrated the XML from the other host it still references the old mount point at '/mnt/667dc116-bc57-3e1f-b21a-96babdbcebd4' which doesnt exist on the destination host and libvirt throws an error to Cloudstack.

EXPECTED RESULTS
Live migration of ISO working, with Cloudstack creating the correct mount that the libvirt XML is using.
ACTUAL RESULTS
Libvirt file not found error
loth commented 6 months ago

@weizhouapache any idea of any further details I should include for this bug? I think 2x secondary storage VMs is largely untested and the root cause of this.

weizhouapache commented 6 months ago

@weizhouapache any idea of any further details I should include for this bug? I think 2x secondary storage VMs is largely untested and the root cause of this.

@loth Is the iso stored on both secondary storages ? Maybe db says it is but actually it isn't.

loth commented 6 months ago

@weizhouapache any idea of any further details I should include for this bug? I think 2x secondary storage VMs is largely untested and the root cause of this.

@loth Is the iso stored on both secondary storages ? Maybe db says it is but actually it isn't.

I checked both storage mounts, ISO exists:

root@s-27167-VM:/mnt/SecStorage/49e5c7d7-bef0-32c9-8be4-505b3b0b51be# find . -name 2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso
./template/tmpl/2/2695/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso
root@s-27167-VM:/mnt/SecStorage/49e5c7d7-bef0-32c9-8be4-505b3b0b51be# cd ..
root@s-27167-VM:/mnt/SecStorage# cd cadb1c27-afcb-314d-a866-e039feae8492/
root@s-27167-VM:/mnt/SecStorage/cadb1c27-afcb-314d-a866-e039feae8492# find . -name 2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso
./template/tmpl/2/2695/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso

Furthermore I think this is more of a bug within cloudstack and how it is generating the location of the ISO. We can see in the output of libvirtd errorlog "/mnt/667dc116-bc57-3e1f-b21a-96babdbcebd4/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso" is in the domain XML however cloudstack created /mnt/49d099eb-456c-3262-867a-3dd38b623a04 for the VM to mount the ISO from (and the ISO does exist at this mount point) but never updated the XML to use this new mount point. The migration completes but the VM fails to start due to this mount point not existing.

So either 2 things needed to happen for this not to error in cloudstack

  1. Use the old mount from the XML during migration when spinning up a new nfs mount at the migration destination
  2. Update the XML to use the new mount point on the destination (not sure if this is possible)
DaanHoogland commented 6 months ago

@loth there are some PRs out for similar issues, #8952. Will you have the chance to test with that patch?

loth commented 6 months ago

@loth there are some PRs out for similar issues, #8952. Will you have the chance to test with that patch?

Hello,

I tried the mentioned patch and it diddnt affect the issue. Here is a rundown on what I did to reproduce it:

setup new 4.18 env with patch upload iso (grml-small in my case) add 2nd secondary storage mount wait for templates to populate redownload/upload template if "secondary storage bypassed" shows create test VM attach uploaded iso attempt live migration

Exception during migrate: org.libvirt.LibvirtException: Cannot access storage file '/mnt/7d79fb97-cdae-3a13-a0ac-29f3b1f37a1e/203-2-76328563-4e13-3db1-87be-e81f46e9394a.iso': No such file or directory

ISO exists on both secondary storage mounts:

root@mgt01-tyler-nfs-test:/data/secondary2# ls -laR  | grep 203-2-76328563-4e13-3db1-87be-e81f46e9394a
-rw-rw-rw- 1 root root 532938752 Apr 29 21:35 203-2-76328563-4e13-3db1-87be-e81f46e9394a.iso
root@mgt01-tyler-nfs-test:/data/secondary2# cd ..
root@mgt01-tyler-nfs-test:/data# cd secondary1
root@mgt01-tyler-nfs-test:/data/secondary1# ls -laR  | grep 203-2-76328563-4e13-3db1-87be-e81f46e9394a
-rw-rw-rw- 1 root root 532938752 Apr 29 21:29 203-2-76328563-4e13-3db1-87be-e81f46e9394a.iso
root@mgt01-tyler-nfs-test:/data/secondary1#

On node01 (destination of live migration) mount for secondary storage was created as '/mnt/1919082e-71af-3ce0-b6d7-df2bf6249f3b'

root@node01-tyler-nfs-test:~# mount | grep secondary
10.1.2.2:/data/secondary1/template/tmpl/2/203 on /mnt/1919082e-71af-3ce0-b6d7-df2bf6249f3b type nfs (rw,nosuid,nodev,noexec,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.1.2.2,mountvers=3,mountport=37418,mountproto=udp,local_lock=none,addr=10.1.2.2)

however libvirt is looking for the ISO at '/mnt/7d79fb97-cdae-3a13-a0ac-29f3b1f37a1e' thus we get an error on migration

Can confirm the ISO does exist at the new directory created on node01, just with the wrong mount UUID string:

root@mgt01-tyler-nfs-test:/data/secondary1# ls -la /data/secondary1/template/tmpl/2/203
total 520464
drwxrwxrwx 2 root root      4096 Apr 29 21:29 .
drwxrwxrwx 4 root root      4096 Apr 29 21:42 ..
-rw-rw-rw- 1 root root 532938752 Apr 29 21:29 203-2-76328563-4e13-3db1-87be-e81f46e9394a.iso
-rw-rw-rw- 1 root root       344 Apr 29 21:29 template.properties
weizhouapache commented 6 months ago

thanks @loth I think it is a bug ACS only stores which ISO is attached to a vm, but does not store the image store id of the ISO.

shwstppr commented 5 months ago

I investigated this with a 4.19 branch env and cannot reproduce the error during the live migration. I've the following setup to start with

What I did for my 2 different attempts: 1st -

2nd -

In both cases, migration worked without an error. @loth @weizhouapache am I missing anything?

weizhouapache commented 5 months ago

I investigated this with a 4.19 branch env and cannot reproduce the error during the live migration. I've the following setup to start with

  • 2x Ubuntu 22.04 hosts in the same cluster
  • 1 secondary store

What I did for my 2 different attempts: 1st -

  • Deployed a VM
  • Registered an ISO
  • Attached the ISO to the VM
  • Added another secondary store
  • Live-migrated VM
  • Waited for the ISO to be available on both the secondary store

2nd -

  • Registered an ISO
  • Added second secondary store
  • Waited for the ISO to be available on both the secondary store
  • Deployed a VM
  • Attached the ISO to the VM
  • Live-migrated VM

In both cases, migration worked without an error. @loth @weizhouapache am I missing anything?

thanks @shwstppr I will have a look

weizhouapache commented 5 months ago

@loth @shwstppr I am able to reproduce the issue in 4.20 the migration worked sometimes, but not all the time

Exception during migrate: org.libvirt.LibvirtException: Cannot access storage file '/mnt/ec8dfdd8-f341-3b0a-988f-cfbc93e46fc4/251-2-2b2071a4-21c7-340e-a861-1bd30fb5cbed.iso': No such file or directory
weizhouapache commented 5 months ago

fixed by #9212