Closed loth closed 5 months ago
@weizhouapache any idea of any further details I should include for this bug? I think 2x secondary storage VMs is largely untested and the root cause of this.
@weizhouapache any idea of any further details I should include for this bug? I think 2x secondary storage VMs is largely untested and the root cause of this.
@loth Is the iso stored on both secondary storages ? Maybe db says it is but actually it isn't.
@weizhouapache any idea of any further details I should include for this bug? I think 2x secondary storage VMs is largely untested and the root cause of this.
@loth Is the iso stored on both secondary storages ? Maybe db says it is but actually it isn't.
I checked both storage mounts, ISO exists:
root@s-27167-VM:/mnt/SecStorage/49e5c7d7-bef0-32c9-8be4-505b3b0b51be# find . -name 2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso
./template/tmpl/2/2695/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso
root@s-27167-VM:/mnt/SecStorage/49e5c7d7-bef0-32c9-8be4-505b3b0b51be# cd ..
root@s-27167-VM:/mnt/SecStorage# cd cadb1c27-afcb-314d-a866-e039feae8492/
root@s-27167-VM:/mnt/SecStorage/cadb1c27-afcb-314d-a866-e039feae8492# find . -name 2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso
./template/tmpl/2/2695/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso
Furthermore I think this is more of a bug within cloudstack and how it is generating the location of the ISO. We can see in the output of libvirtd errorlog "/mnt/667dc116-bc57-3e1f-b21a-96babdbcebd4/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso" is in the domain XML however cloudstack created /mnt/49d099eb-456c-3262-867a-3dd38b623a04 for the VM to mount the ISO from (and the ISO does exist at this mount point) but never updated the XML to use this new mount point. The migration completes but the VM fails to start due to this mount point not existing.
So either 2 things needed to happen for this not to error in cloudstack
@loth there are some PRs out for similar issues, #8952. Will you have the chance to test with that patch?
@loth there are some PRs out for similar issues, #8952. Will you have the chance to test with that patch?
Hello,
I tried the mentioned patch and it diddnt affect the issue. Here is a rundown on what I did to reproduce it:
setup new 4.18 env with patch upload iso (grml-small in my case) add 2nd secondary storage mount wait for templates to populate redownload/upload template if "secondary storage bypassed" shows create test VM attach uploaded iso attempt live migration
Exception during migrate: org.libvirt.LibvirtException: Cannot access storage file '/mnt/7d79fb97-cdae-3a13-a0ac-29f3b1f37a1e/203-2-76328563-4e13-3db1-87be-e81f46e9394a.iso': No such file or directory
ISO exists on both secondary storage mounts:
root@mgt01-tyler-nfs-test:/data/secondary2# ls -laR | grep 203-2-76328563-4e13-3db1-87be-e81f46e9394a
-rw-rw-rw- 1 root root 532938752 Apr 29 21:35 203-2-76328563-4e13-3db1-87be-e81f46e9394a.iso
root@mgt01-tyler-nfs-test:/data/secondary2# cd ..
root@mgt01-tyler-nfs-test:/data# cd secondary1
root@mgt01-tyler-nfs-test:/data/secondary1# ls -laR | grep 203-2-76328563-4e13-3db1-87be-e81f46e9394a
-rw-rw-rw- 1 root root 532938752 Apr 29 21:29 203-2-76328563-4e13-3db1-87be-e81f46e9394a.iso
root@mgt01-tyler-nfs-test:/data/secondary1#
On node01 (destination of live migration) mount for secondary storage was created as '/mnt/1919082e-71af-3ce0-b6d7-df2bf6249f3b'
root@node01-tyler-nfs-test:~# mount | grep secondary
10.1.2.2:/data/secondary1/template/tmpl/2/203 on /mnt/1919082e-71af-3ce0-b6d7-df2bf6249f3b type nfs (rw,nosuid,nodev,noexec,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.1.2.2,mountvers=3,mountport=37418,mountproto=udp,local_lock=none,addr=10.1.2.2)
however libvirt is looking for the ISO at '/mnt/7d79fb97-cdae-3a13-a0ac-29f3b1f37a1e' thus we get an error on migration
Can confirm the ISO does exist at the new directory created on node01, just with the wrong mount UUID string:
root@mgt01-tyler-nfs-test:/data/secondary1# ls -la /data/secondary1/template/tmpl/2/203
total 520464
drwxrwxrwx 2 root root 4096 Apr 29 21:29 .
drwxrwxrwx 4 root root 4096 Apr 29 21:42 ..
-rw-rw-rw- 1 root root 532938752 Apr 29 21:29 203-2-76328563-4e13-3db1-87be-e81f46e9394a.iso
-rw-rw-rw- 1 root root 344 Apr 29 21:29 template.properties
thanks @loth I think it is a bug ACS only stores which ISO is attached to a vm, but does not store the image store id of the ISO.
I investigated this with a 4.19 branch env and cannot reproduce the error during the live migration. I've the following setup to start with
What I did for my 2 different attempts: 1st -
2nd -
In both cases, migration worked without an error. @loth @weizhouapache am I missing anything?
I investigated this with a 4.19 branch env and cannot reproduce the error during the live migration. I've the following setup to start with
- 2x Ubuntu 22.04 hosts in the same cluster
- 1 secondary store
What I did for my 2 different attempts: 1st -
- Deployed a VM
- Registered an ISO
- Attached the ISO to the VM
- Added another secondary store
- Live-migrated VM
- Waited for the ISO to be available on both the secondary store
2nd -
- Registered an ISO
- Added second secondary store
- Waited for the ISO to be available on both the secondary store
- Deployed a VM
- Attached the ISO to the VM
- Live-migrated VM
In both cases, migration worked without an error. @loth @weizhouapache am I missing anything?
thanks @shwstppr I will have a look
@loth @shwstppr I am able to reproduce the issue in 4.20 the migration worked sometimes, but not all the time
Exception during migrate: org.libvirt.LibvirtException: Cannot access storage file '/mnt/ec8dfdd8-f341-3b0a-988f-cfbc93e46fc4/251-2-2b2071a4-21c7-340e-a861-1bd30fb5cbed.iso': No such file or directory
fixed by #9212
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
KVM using 2x secondary storage and many primary storage mounts
OS / ENVIRONMENT
Ubuntu 22.04
SUMMARY
When attempting to live migrate a VM Cloudstack throws an error from libvirt with 'Invocation exception, caused by: com.cloud.utils.exception.CloudRuntimeException: Exception during migrate: org.libvirt.LibvirtException: Cannot access storage file '/mnt/667dc116-bc57-3e1f-b21a-96babdbcebd4/2695-2-bec1fada-4469-3072-ab2c-799b0163f1eb.iso': No such file or directory'
This file exists on the secondary storage, and the agent successfully creates a new storage pool using the ISO, however it generates it under a new directory structure.
STEPS TO REPRODUCE
This seems to only happen with long-standing VM's, newly VM's seem to migrate fine. It's possible this only occurs when a storage pool needs to be created.
Here is the excerpt from the migration job from the management server;
on the agent I can see the pool being searched for, then created since it cannot find it and finally a stop command from the management server.
It seems that Cloudstack created '/mnt/49d099eb-456c-3262-867a-3dd38b623a04' however when libvirt migrated the XML from the other host it still references the old mount point at '/mnt/667dc116-bc57-3e1f-b21a-96babdbcebd4' which doesnt exist on the destination host and libvirt throws an error to Cloudstack.
EXPECTED RESULTS
ACTUAL RESULTS