apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2k stars 1.09k forks source link

live migration of encrypted volume fails with nfs #8255

Open borisstoyanov opened 10 months ago

borisstoyanov commented 10 months ago
ISSUE TYPE
COMPONENT NAME
API
CLOUDSTACK VERSION
4.18.0, 4.18.1
CONFIGURATION

NFS shared storage

OS / ENVIRONMENT
SUMMARY

When doing migration via 'migrateVMwithVolumes' API I get an exception for missing secret

STEPS TO REPRODUCE
1. deploy a VM 
2. add a data disk which is encrypted
3. migrateVirtualMachineWithVolumes to another storage/host 
4. Observe the error 
EXPECTED RESULTS
migration should pass
ACTUAL RESULTS
fails
2023-11-21 06:35:03,397 INFO  [resource.wrapper.LibvirtMigrateCommandWrapper] (agentRequest-Handler-1:null) (logid:455553da) Migration thread of VM [i-2-3-VM] finished.
2023-11-21 06:35:03,397 DEBUG [agent.properties.AgentPropertiesFileHandler] (agentRequest-Handler-1:null) (logid:455553da) Property [vm.migrate.domain.retrieve.timeout] has empty or null value. Using default value [10].
2023-11-21 06:35:03,398 ERROR [resource.wrapper.LibvirtMigrateCommandWrapper] (agentRequest-Handler-1:null) (logid:455553da) Can't migrate domain [i-2-3-VM] due to: [org.libvirt.LibvirtException: Secret not found: no secret with matching uuid '06292cd0-349c-32d9-b0d4-bfaaf7844efa'].
java.util.concurrent.ExecutionException: org.libvirt.LibvirtException: Secret not found: no secret with matching uuid '06292cd0-349c-32d9-b0d4-bfaaf7844efa'
    at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
    at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtMigrateCommandWrapper.execute(LibvirtMigrateCommandWrapper.java:296)
    at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtMigrateCommandWrapper.execute(LibvirtMigrateCommandWrapper.java:86)
    at com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78)
    at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1848)
    at com.cloud.agent.Agent.processRequest(Agent.java:662)
    at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1082)
    at com.cloud.utils.nio.Task.call(Task.java:83)
    at com.cloud.utils.nio.Task.call(Task.java:29)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.libvirt.LibvirtException: Secret not found: no secret with matching uuid '06292cd0-349c-32d9-b0d4-bfaaf7844efa'
    at org.libvirt.ErrorHandler.processError(Unknown Source)
    at org.libvirt.ErrorHandler.processError(Unknown Source)
    at org.libvirt.Domain.migrate(Unknown Source)
    at com.cloud.hypervisor.kvm.resource.MigrateKVMAsync.call(MigrateKVMAsync.java:124)
    at com.cloud.hypervisor.kvm.resource.MigrateKVMAsync.call(MigrateKVMAsync.java:27)
    ... 4 more
harikrishna-patnala commented 10 months ago

Looks like, the same applies for powerflex storage as well, we are not allowing the migration at service layer itself.

if (srcStoragePoolVO.isManaged() && srcStoragePoolVO.getId() != destStoragePoolVO.getId()) {
    throw new CloudRuntimeException("Migrating a volume online with KVM from managed storage is not currently supported.");
}

We can consider this as an enhancement to allow "migrateVMwithVolumes" API to handle volume migration as well for both managed and NFS storages.

harikrishna-patnala commented 10 months ago

@borisstoyanov I think 4.18.0 version also has the same issue. Updated the version in description.

weizhouapache commented 10 months ago

2023-11-21 06:35:03,397 INFO [resource.wrapper.LibvirtMigrateCommandWrapper] (agentRequest-Handler-1:null) (logid:455553da) Migration thread of VM [i-2-3-VM] finished. 2023-11-21 06:35:03,397 DEBUG [agent.properties.AgentPropertiesFileHandler] (agentRequest-Handler-1:null) (logid:455553da) Property [vm.migrate.domain.retrieve.timeout] has empty or null value. Using default value [10]. 2023-11-21 06:35:03,398 ERROR [resource.wrapper.LibvirtMigrateCommandWrapper] (agentRequest-Handler-1:null) (logid:455553da) Can't migrate domain [i-2-3-VM] due to: [org.libvirt.LibvirtException: Secret not found: no secret with matching uuid '06292cd0-349c-32d9-b0d4-bfaaf7844efa']. java.util.concurrent.ExecutionException: org.libvirt.LibvirtException: Secret not foun

@harikrishna-patnala do we need to set the milestone ?

DaanHoogland commented 8 months ago

@harikrishna-patnala setting 4.18.2 for now, please update

abh1sar commented 3 months ago

This happens when VM is live migrated along with migration of an encrypted data volume to a different pool. If the data volume is not moved to a different pool explicitly, the test case might pass.

StorageSystemDataMotionStrategy.copyAsync()

                if (isNonManagedNfsToNfsOrSharedMountPointToNfs) {
                    migrateDiskInfo = new MigrateCommand.MigrateDiskInfo(srcVolumeInfo.getPath(),
                            MigrateCommand.MigrateDiskInfo.DiskType.FILE,
                            MigrateCommand.MigrateDiskInfo.DriverType.QCOW2,
                            MigrateCommand.MigrateDiskInfo.Source.FILE,
                            connectHostToVolume(destHost, destVolumeInfo.getPoolId(), volumeIdentifier));
                } else {
                    migrateDiskInfo = configureMigrateDiskInfo(srcVolumeInfo, destPath);
                    migrateDiskInfo.setSourceDiskOnStorageFileSystem(isStoragePoolTypeOfFile(sourceStoragePool));
                    migrateDiskInfoList.add(migrateDiskInfo);
                    prepareDiskWithSecretConsumerDetail(vmTO, srcVolumeInfo, destVolumeInfo.getPath());
                }

prepareDiskWithSecretConsumerDetail(vmTO, srcVolumeInfo, destVolumeInfo.getPath()); needs to be called for NonManagedNfs.. also, otherwise the secret on the destination host will be configured with the source volume's path.

The code, it seems like, is present since vol encryption was first introduced in 4.18.0

rohityadavcloud commented 3 months ago

isn't this considered a serious issue @sureshanaparti ?

sureshanaparti commented 3 months ago

isn't this considered a serious issue @sureshanaparti ?

@rohityadavcloud this is there since volume encryption feature (in 4.18.0), seems to be an improvement on top of current volume encryption functionality and it needs proper testing. moved to next milestone for now, any concerns?