Closed swhitman0 closed 1 year ago
Thanks for opening your first issue here! Be sure to follow the issue template!
@swhitman0
when i gave "kubectl delete pod <>" "kubectl delete pvc <>"
I was able to detach /delete the volume on cloud stack and delete the pods , there was no issue
Could you please check if the csi provider is running fine
kubectl get pods -n kube-system |grep cloudstack
cloudstack-csi-controller-5cf89485f6-xfnbm 3/3 Running 0 19m
cloudstack-csi-node-4w4kt 2/2 Running 0 31m
cloudstack-csi-node-tl6nj 2/2 Running 0 31m
also, i would request you to log the issue at this Git hub repo
@kiranchavala
Thanks for the response. The csi provider is running fine - 3/3 Running - no restarts.
Instead of executing kubectl delete -f
I will file the issue in the apalla/cloudstack-csi-driver as you requested.
Would you happen to know how I could clean up these volumes in Cloudstack? When I press the "Detach disk" button, I get an error.
@swhitman0
You can try to detach the volume from Cloudstack ui or from the api call
https://cloudstack.apache.org/api/apidocs-4.17/apis/detachVolume.html
From the logs i see that the detachment failed because the vm-node was in booting stage
2023-03-05 14:18:39,144 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-92:ctx-3f7cf487 job-1657/job-1658 ctx-d6957fbc) (logid:fd682bcf) Invocation exception, caused by: com.cloud.utils.exception.CloudRuntimeException: Failed to detach volume pvc-0e82d1a8-9af2-40b3-b037-ebcf151febd3 from VM testk8s-node-186b0939378; com.cloud.exception.InternalErrorException: Could not detach volume. Probably the VM is in boot state at the moment
@kiranchavala did you test with 4.17.2.0 or 4.18.0.0-RC ? The issue might have been fixed by #6508
@weizhouapache i tested on 4.17.2 and did not hit the issue of volume detachment
@kiranchavala OK. thanks.
I tested with 4.18.0-RC2 just now. The volume can be detached and removed when pod/pvc is removed. Even if I detach the volume without removing pod/pvc, the operation also succeed.
@kiranchavala The error is different when trying to detach the volume from the Cloudstack UI or using the cli. The error is:
"jobresult": {
"errorcode": 530,
"errortext": "Failed to detach volume pvc-0e82d1a8-9af2-40b3-b037-ebcf151febd3 from VM testk8s-node-186b0939378; com.cloud.exception.InternalErrorException: disk: /mnt/cb06f23d-d0cb-3b1d-b0b2-8b64898dde9f/5998e60f-f7b0-4f13-b4fb-13b74eb8aef7 is not attached before"
},
@kiranchavala The error is different when trying to detach the volume from the Cloudstack UI or using the cli. The error is:
"jobresult": { "errorcode": 530, "errortext": "Failed to detach volume pvc-0e82d1a8-9af2-40b3-b037-ebcf151febd3 from VM testk8s-node-186b0939378; com.cloud.exception.InternalErrorException: disk: /mnt/cb06f23d-d0cb-3b1d-b0b2-8b64898dde9f/5998e60f-f7b0-4f13-b4fb-13b74eb8aef7 is not attached before" },
@swhitman0 it looks like the behavior is same as below
when remove a pvc, the volume was detached successfully, but got an error message:
com.cloud.utils.exception.CloudRuntimeException: Failed to detach volume pvc-0e82d1a8-9af2-40b3-b037-ebcf151febd3 from VM testk8s-node-186b0939378; com.cloud.exception.InternalErrorException: Could not detach volume. Probably the VM is in boot state at the moment
when detach the volume, got an error message as below, which is because the volume has been detached actually in step 1.
Failed to detach volume pvc-0e82d1a8-9af2-40b3-b037-ebcf151febd3 from VM testk8s-node-186b0939378; com.cloud.exception.InternalErrorException: disk: /mnt/cb06f23d-d0cb-3b1d-b0b2-8b64898dde9f/5998e60f-f7b0-4f13-b4fb-13b74eb8aef7 is not attached before
now I believe this has been fixed by #6508 can you test with 4.18.0.0-RC2 if possible ?
@kiranchavala Is there a Yum repo for release candidates? I have not yet built Cloudstack from source code.
@swhitman0 yes, you can find the packages at: http://packages.shapeblue.com/cloudstack/upstream/testing/4.18.0.0-RC20230224T1301/
@swhitman0 I have reproduced the issue in rocky8 environment. it works with centos7
2023-03-08 13:59:36,331 DEBUG [kvm.storage.KVMStorageProcessor] (agentRequest-Handler-2:null) (logid:71e84de5) Detaching device: <disk device='disk' type='file'>
<driver name='qemu' type='qcow2' cache='none' />
<source file='/mnt/28140a6d-390e-3e75-9cd1-431f2a8a777b/e6e94028-9772-4367-b80b-f42f689f535a'/>
<target dev='vdb' bus='virtio'/>
</disk>
2023-03-08 13:59:36,488 DEBUG [kvm.storage.KVMStorageProcessor] (agentRequest-Handler-2:null) (logid:71e84de5) Failed to detach volume: e6e94028-9772-4367-b80b-f42f689f535a, due to
com.cloud.exception.InternalErrorException: Could not detach volume. Probably the VM is in boot state at the moment
I will test with 4.18.0.0-RC2
@swhitman0 I did a bunch of testing on cloudstack 4.18, and got the following results
Failed to detach volume DATA-14 from VM debian11; com.cloud.exception.InternalErrorException: Could not detach volume after sending the command and waiting for [10000] milliseconds. Probably the VM does not support the sent detach command or the device is busy at the moment. Try again in a couple of minutes.
tried again, and got a different error
Failed to detach volume DATA-14 from VM debian11; org.libvirt.LibvirtException: internal error: unable to execute QEMU command 'device_del': Device virtio-disk1 is already in the process of unplug
Got the same error when the vm is booted into login screen.
I have also tested 4.17.2 systemvm template on 4.17.2 environment, the results are almost same except the different error message.
considering everything works fine on centos7, I think it is an issue with rocky8/libvirt/qemu, not cloudstack. cc @swhitman0
everything works fine on ubuntu22 as well
@weizhouapache
Thanks for reproducing the problem on Rocky 8. Although many of our systems are centos7, we avoid it with new projects because it is near EOL. We are familiar with the Red Hat ecosystem - kickstart, Identity Management, ansible, etc. - making ubuntu22 a heavier lift, but it could be done.
Will this issue with Rocky 8, and I assume other RHEL 8-based OSes, be resolved?
@swhitman0 I did a few testing on alma8. The issue can be reproduced in 4.17.2. everything works fine on 4.18.0.0-RC2
@weizhouapache That is good to know. I'll upgrade to 4.18.0.0-RC2 at the first opportunity. Can I use the upgrade documentation for 4.17.2, or do you have draft 4.18 upgrade docs somewhere?
@weizhouapache That is good to know. I'll upgrade to 4.18.0.0-RC2 at the first opportunity. Can I use the upgrade documentation for 4.17.2, or do you have draft 4.18 upgrade docs somewhere?
The upgrade process is similar as 4.17.2. You just need to backup database, install packages and restart cloudstack services (cloudstack-management or cloudstack-agent). The new systemvm template registration is not needed since 4.17.
@weizhouapache
The upgrade was successful, and persistent volume claims can be created and deleted without issue on Rocky 8.7. Thanks for helping.
@swhitman0 Great. Closing this issue then.
thanks @kiranchavala for the investigation and testing.
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
OS / ENVIRONMENT
Rocky Linnux 8.7
SUMMARY
When a Kubernetes persistent volume claim is deleted, the Cloudstack volume is not deleted, and an error message is written to the management log.
STEPS TO REPRODUCE
I expected the Cloudstack storage volume backing the PVC would be deleted.
2023-03-05 14:18:39,144 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-92:ctx-3f7cf487 job-1657/job-1658 ctx-d6957fbc) (logid:fd682bcf) Invocation exception, caused by: com.cloud.utils.exception.CloudRuntimeException: Failed to detach volume pvc-0e82d1a8-9af2-40b3-b037-ebcf151febd3 from VM testk8s-node-186b0939378; com.cloud.exception.InternalErrorException: Could not detach volume. Probably the VM is in boot state at the moment 2023-03-05 14:18:39,144 INFO [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-92:ctx-3f7cf487 job-1657/job-1658 ctx-d6957fbc) (logid:fd682bcf) Rethrow exception com.cloud.utils.exception.CloudRuntimeException: Failed to detach volume pvc-0e82d1a8-9af2-40b3-b037-ebcf151febd3 from VM testk8s-node-186b0939378; com.cloud.exception.InternalErrorException: Could not detach volume. Probably the VM is in boot state at the moment 2023-03-05 14:18:39,144 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-92:ctx-3f7cf487 job-1657/job-1658) (logid:fd682bcf) Done with run of VM work job: com.cloud.vm.VmWorkDetachVolume for VM 188, job origin: 1657 2023-03-05 14:18:39,144 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-92:ctx-3f7cf487 job-1657/job-1658) (logid:fd682bcf) Unable to complete AsyncJobVO: {id:1658, userId: 4, accountId: 2, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkDetachVolume, cmdInfo: rO0ABXNyAB9jb20uY2xvdWQudm0uVm1Xb3JrRGV0YWNoVm9sdW1lhvReA-s81AMCAAFMAAh2b2x1bWVJZHQAEExqYXZhL2xhbmcvTG9uZzt4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1ldAASTGphdmEvbGFuZy9TdHJpbmc7eHAAAAAAAAAAAgAAAAAAAAAEAAAAAAAAALx0ABRWb2x1bWVBcGlTZXJ2aWNlSW1wbHNyAA5qYXZhLmxhbmcuTG9uZzuL5JDMjyPfAgABSgAFdmFsdWV4cgAQamF2YS5sYW5nLk51bWJlcoaslR0LlOCLAgAAeHAAAAAAAAAAvA, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 128437426857854, completeMsid: null, lastUpdated: null, lastPolled: null, created: Sun Mar 05 14:18:36 EST 2023, removed: null}, job origin:1657 com.cloud.utils.exception.CloudRuntimeException: Failed to detach volume pvc-0e82d1a8-9af2-40b3-b037-ebcf151febd3 from VM testk8s-node-186b0939378; com.cloud.exception.InternalErrorException: Could not detach volume. Probably the VM is in boot state at the moment at com.cloud.storage.VolumeApiServiceImpl.orchestrateDetachVolumeFromVM(VolumeApiServiceImpl.java:2793) at com.cloud.storage.VolumeApiServiceImpl.orchestrateDetachVolumeFromVM(VolumeApiServiceImpl.java:4404) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) at com.cloud.storage.VolumeApiServiceImpl.handleVmWorkJob(VolumeApiServiceImpl.java:4435) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:215) at com.sun.proxy.$Proxy231.handleVmWorkJob(Unknown Source) at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:620) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:568) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)