Closed canghai908 closed 1 year ago
Thanks for opening your first issue here! Be sure to follow the issue template!
@canghai908 are the systemvms running on same ceph storage pool ?
yes, the systemvms running on same ceph storage pool。
Wei Zhou @.***> 于2022年10月23日周日 17:13写道:
@canghai908 https://github.com/canghai908 are the systemvms running on same ceph storage pool ?
— Reply to this email directly, view it on GitHub https://github.com/apache/cloudstack/issues/6842#issuecomment-1288060214, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALCH4XAMLRS456YV3PCWCDWET6T5ANCNFSM6AAAAAARL26SNY . You are receiving this because you were mentioned.Message ID: @.***>
@canghai908
@weizhouapache cloudstack-agent and management are same version (4.17.0).Now I update cloudstack agent and management to 4.17.1.0 the virtual routers system is work ok,but my some vm is hang like this.I don't know if it's cloudstack's problem or ceph's problem in the morning。
@weizhouapache My problem solved!The problem is due to ceph's locks。look this article
https://www.cnblogs.com/zphj1987/p/14155644.html
I would like to know if cloudstack needs the exclusive-lock feature of rbd, and if not, can I disable this feature to avoid this problem from happening。
@weizhouapache My problem solved!The problem is due to ceph's locks。look this article https://www.cnblogs.com/zphj1987/p/14155644.html I would like to know if cloudstack needs the exclusive-lock feature of rbd, and if not, can I disable this feature to avoid this problem from happening。
@canghai908 good ! thanks for update and sharing
for your question, maybe @wido can help.
@weizhouapache My problem solved!The problem is due to ceph's locks。look this article https://www.cnblogs.com/zphj1987/p/14155644.html I would like to know if cloudstack needs the exclusive-lock feature of rbd, and if not, can I disable this feature to avoid this problem from happening。
@canghai908 good ! thanks for update and sharing
for your question, maybe @wido can help.
We should not want to disable exclusive-locking as this is an important feature of Ceph to prevent data corruption.
Who is holding the lock @canghai908 ? Why is another client locking this image on Ceph?
@wido the vm images on ceph.The image is locke by cloudstack computer node The reason for the lockup is due to network issues and unexpected restart of the vm hypervisor machine. I rm the image lock The virtual machine starts normally
Was the VM running on that compute node before? Did it crash?
the exclusive lock should timeout after a couple of minutes after which you can start the VM on a different host.
@wido
Was the VM running on that compute node before? Did it crash? yes.VM running on that compute node,the node is crash and the ceph cluster network also has a problem. In my case the lock was not timeout. After manual remove lock it was fine. Excuse me, where is the timeout time of the lock configured?
@wido
Was the VM running on that compute node before? Did it crash? yes.VM running on that compute node,the node is crash and the ceph cluster network also has a problem. In my case the lock was not timeout. After manual remove lock it was fine. Excuse me, where is the timeout time of the lock configured?
How long did you wait? And was the other node really down?
Because the exclusive lock should be handed over to another node if the old one goes down. If that doesn't happen Ceph blocks because there is a potential data corruption risk.
@wido
How long did you wait? And was the other node really down?
wait for 12 hours。other node really down by unexpected。
@wido
How long did you wait? And was the other node really down?
wait for 12 hours。other node really down by unexpected。
That is odd, the exclusive lock should timeout and be handed over to the other client.
If that doesn't work something is wrong, but disabling exclusive locks should not be done.
That said, this seems like a Ceph issue and not a CloudStack issue.
Do your Ceph clients have the proper authx capabilities in Ceph? See: https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/
In order for blocklisting to work, the client must have the osd blocklist capability. This capability is included in the profile rbd capability profile, which should generally be set on all Ceph [client identities](https://docs.ceph.com/en/latest/rados/operations/user-management/#user-management) using RBD.
@wido Thank you!I will check the ceph cluster.
ISSUE TYPE
COMPONENT NAME
Virtual Routers vm is no starting
CLOUDSTACK VERSION
OS:centos 7.9 cloudstack:4.17.0
CONFIGURATION
OS / ENVIRONMENT
CEPH 15.6 cloudstack:4.17.0 libvirt-4.5.0-36.el7_9.5.x86_64
SUMMARY
STEPS TO REPRODUCE
When I remote the kvm hypervisor,thg virtual routers system vm is no starting. system vm up ok virtual routers system vm is hang connect to vnc