apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2.05k stars 1.1k forks source link

Error when deleting former 4.16.1 volume snapshot from 4.17.1 #6804

Closed levindecaro closed 7 months ago

levindecaro commented 2 years ago
ISSUE TYPE
COMPONENT NAME
Volume Snapshot
CLOUDSTACK VERSION
4.17.1
CONFIGURATION

kvm SharedMountPoint (CephFS) snapshot.backup.to.secondary=true kvm.snapshot.enabled=true kvm.vmstoragesnapshot.enabled=true

OS / ENVIRONMENT

CentOS Linux release 7.9.2009 (Core) 3.10.0-1160.45.1.el7.x86_64 libvirt-4.5.0-36.el7_9.5.x86_64 qemu-kvm-ev-2.12.0-44.1.el7_8.1.x86_64 qemu-img-ev-2.12.0-44.1.el7_8.1.x86_64 qemu-kvm-common-ev-2.12.0-44.1.el7_8.1.x86_64 libvirt-daemon-driver-qemu-4.5.0-36.el7_9.5.x86_64

SUMMARY

Delete a existing snapshot taken in 4.16.1, ACS report failed, trying to delete a nested path of the volume ID at first attempt.

2022-10-04 15:24:20,927 ERROR [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Failed to delete snapshot [SnapshotTO[datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|path/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c]] on storage [primary storage {uuid: "5db4c6c5-cc09-4f99-af42-c2256f76748c", name: "sg1"}] due to [Failed to remove snapshot 7ccea8f2-f393-4db1-9c11-b60a337cb53c].
STEPS TO REPRODUCE
1. Have a volume snapshot created in 4.16.1
2. Delete the snapshot
EXPECTED RESULTS
Should able to delete the snapshot either stored in primary storage or secondary storage.
ACTUAL RESULTS
  1. first click delete, Error throw in the UI

    Delete snapshot
    (master-0_ROOT-817_20211122044926) Failed to delete snapshot

    master-0_ROOT-817_20211122044926 volume snapshot still persist in UI

  2. click delete second time, no error, master-0_ROOT-817_20211122044926 volume snapshot entry removed from UI.

Error log shown in ACS management.

2022-10-04 02:56:28,983 ERROR [o.a.c.s.e.DefaultEndPointSelector] (Work-Job-Executor-21:ctx-a593618e job-67898/job-67899 ctx-c3745aa0) (logid:54cab466) IR24 select BACKUPSNAPSHOT from primary to secondary 514 dest=514
2022-10-04 02:56:56,074 ERROR [o.a.c.s.s.DefaultSnapshotStrategy] (Work-Job-Executor-21:ctx-a593618e job-67898/job-67899 ctx-c3745aa0) (logid:54cab466) Failed to delete snapshot [SnapshotTO[datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]|volume=volumeTO[uuid=2ceda2f6-1b2a-4258-8867-9164d76d385a|path=2ceda2f6-1b2a-4258-8867-9164d76d385a|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|path/cloudstack/sg1/2ceda2f6-1b2a-4258-8867-9164d76d385a/ee83284b-96a7-47bd-93fa-d2114e49e8d2]] on storage [primary storage {uuid: "5db4c6c5-cc09-4f99-af42-c2256f76748c", name: "sg1"}] due to [Failed to remove snapshot ee83284b-96a7-47bd-93fa-d2114e49e8d2].
2022-10-04 04:12:00,565 ERROR [o.a.c.s.e.DefaultEndPointSelector] (Work-Job-Executor-22:ctx-5297b57b job-67900/job-67901 ctx-5e5707ea) (logid:6b1158c6) IR24 select BACKUPSNAPSHOT from primary to secondary 515 dest=515
2022-10-04 04:26:29,821 ERROR [o.a.c.s.s.DefaultSnapshotStrategy] (Work-Job-Executor-22:ctx-5297b57b job-67900/job-67901 ctx-5e5707ea) (logid:6b1158c6) Failed to delete snapshot [SnapshotTO[datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]|volume=volumeTO[uuid=2f926e86-12b1-41bb-bae4-9284a81cc6a3|path=2f926e86-12b1-41bb-bae4-9284a81cc6a3|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|path/cloudstack/sg1/2f926e86-12b1-41bb-bae4-9284a81cc6a3/801e8904-6942-430b-a9d9-686ffc2fa372]] on storage [primary storage {uuid: "5db4c6c5-cc09-4f99-af42-c2256f76748c", name: "sg1"}] due to [Failed to remove snapshot 801e8904-6942-430b-a9d9-686ffc2fa372].
2022-10-04 10:41:49,118 ERROR [o.a.c.u.m.SMTPMailSender] (Email-Alerts-Sender-4:null) (logid:a9d63332) Unable to send mail [Health checks failed: 1 failing checks on router r-3560-VM / 32800e52-60e6-4816-bb1c-ea0a63e2133c] to the recipcients [[{"mailAddress":"levin.ng@example.com"}]].
2022-10-04 10:42:58,542 ERROR [c.c.a.ApiServer] (qtp775386112-1363:ctx-8c524ecd ctx-f1c445a9) (logid:c13c1caa) unhandled exception executing api command: [Ljava.lang.String;@726fcb92
2022-10-04 11:11:28,665 ERROR [o.a.c.u.m.SMTPMailSender] (Email-Alerts-Sender-5:null) (logid:fb41eaa6) Unable to send mail [Health checks failed: 2 failing checks on router r-4228-VM / 52fd2ae8-e4b9-4960-a78d-03a86651997a] to the recipcients [[{"mailAddress":"levin.ng@example.com"}]].
2022-10-04 15:24:20,927 ERROR [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Failed to delete snapshot [SnapshotTO[datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|path/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c]] on storage [primary storage {uuid: "5db4c6c5-cc09-4f99-af42-c2256f76748c", name: "sg1"}] due to [Failed to remove snapshot 7ccea8f2-f393-4db1-9c11-b60a337cb53c].
2022-10-04 16:11:57,618 ERROR [c.c.a.q.QueryManagerImpl] (qtp775386112-1620:ctx-a8a18c39 ctx-5062b85c) (logid:8538f7b6) Incorrect format of the template id 252
[root@cloudstack-1 management]# grep ctx-8d8ba987 management-server.log
2022-10-04 15:24:20,063 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-59:ctx-8d8ba987 job-67955) (logid:aebd9e1c) Add job-67955 into job monitoring
2022-10-04 15:24:20,070 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-59:ctx-8d8ba987 job-67955) (logid:7aeebf72) Executing AsyncJobVO: {id:67955, userId: 2, accountId: 2, instanceType: Snapshot, instanceId: 82, cmd: org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd, cmdInfo: {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"145968","id":"44d9b521-861c-4c62-be4a-c1523d4a9950","ctxDetails":"{\"interface com.cloud.storage.Snapshot\":\"44d9b521-861c-4c62-be4a-c1523d4a9950\"}","projectid":"e61c599d-76fb-4cbc-9179-a47b381f49a2","ctxAccountId":"2","uuid":"44d9b521-861c-4c62-be4a-c1523d4a9950","cmdEventType":"SNAPSHOT.DELETE"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 52235250599, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2022-10-04 15:24:20,076 WARN  [c.c.a.d.ParamGenericValidationWorker] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Received unknown parameters for command deleteSnapshot. Unknown parameters : projectid
2022-10-04 15:24:20,092 DEBUG [o.a.c.s.s.StorPoolSnapshotStrategy] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) StorpoolSnapshotStrategy.canHandle: snapshot=master-0_ROOT-817_20211122044926, uuid=44d9b521-861c-4c62-be4a-c1523d4a9950, op=DELETE
2022-10-04 15:24:20,126 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Deleting SnapshotTO[datastore=com.cloud.agent.api.to.NfsTO@3f2084f8|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|pathsnapshots/57/866/7ccea8f2-f393-4db1-9c11-b60a337cb53c] chain of snapshots.
2022-10-04 15:24:20,127 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Snapshot [SnapshotTO[datastore=com.cloud.agent.api.to.NfsTO@3f2084f8|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|pathsnapshots/57/866/7ccea8f2-f393-4db1-9c11-b60a337cb53c]] does not have children; therefore, we will delete it and its parents.
2022-10-04 15:24:20,147 DEBUG [c.c.h.o.r.Ovm3HypervisorGuru] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) getCommandHostDelegation: class org.apache.cloudstack.storage.command.DeleteCommand
2022-10-04 15:24:20,147 DEBUG [c.c.h.XenServerGuru] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) We are returning the default host to execute commands because the command is not of Copy type.
2022-10-04 15:24:20,154 DEBUG [c.c.a.t.Request] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Seq 63-6460413665462978095: Sending  { Cmd , MgmtId: 52235250599, via: 63(s-1855-VM), Ver: v1, Flags: 100011, [{"org.apache.cloudstack.storage.command.DeleteCommand":{"data":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"snapshots/57/866/7ccea8f2-f393-4db1-9c11-b60a337cb53c","volume":{"uuid":"d0126a5a-62c9-4935-b6f8-bba159d6fbae","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5db4c6c5-cc09-4f99-af42-c2256f76748c","name":"sg1","id":"3","poolType":"SharedMountPoint","host":"localhost","path":"/cloudstack/sg1","port":"0","url":"SharedMountPoint://localhost/cloudstack/sg1/?ROLE=Primary&STOREUUID=5db4c6c5-cc09-4f99-af42-c2256f76748c","isManaged":"false"}},"name":"ROOT-817","size":"(100.00 GB) 107374182400","path":"d0126a5a-62c9-4935-b6f8-bba159d6fbae","volumeId":"866","vmName":"i-57-817-VM","accountId":"57","format":"QCOW2","provisioningType":"THIN","poolId":"3","id":"866","deviceId":"0","cacheMode":"NONE","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"},"dataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://staging-nfs.lab.example.com/StagingVol1/cloudstack","_role":"Image","nfsVersion":"4.1"}},"vmName":"i-57-817-VM","name":"master-0_ROOT-817_20211122044926","hypervisorType":"KVM","id":"82","quiescevm":"false","physicalSize":"0"}},"wait":"0","bypassHostMaintenance":"false"}}] }
2022-10-04 15:24:20,849 DEBUG [c.c.a.t.Request] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Seq 63-6460413665462978095: Received:  { Ans: , MgmtId: 52235250599, via: 63(s-1855-VM), Ver: v1, Flags: 10, { Answer } }
2022-10-04 15:24:20,859 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Snapshot {"uuid":"44d9b521-861c-4c62-be4a-c1523d4a9950","name":"master-0_ROOT-817_20211122044926","volumeId":866} was deleted on secondary storage {uuid: "ec953d8c-7947-448d-9c3c-e3f6d0cf8ba4", name: "nfs://staging-nfs.lab.example.com/StagingVol1/cloudstack"}.
2022-10-04 15:24:20,880 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Deleting SnapshotTO[datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|path/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c] chain of snapshots.
2022-10-04 15:24:20,882 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Snapshot [SnapshotTO[datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|path/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c]] does not have children; therefore, we will delete it and its parents.
2022-10-04 15:24:20,907 DEBUG [c.c.h.o.r.Ovm3HypervisorGuru] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) getCommandHostDelegation: class org.apache.cloudstack.storage.command.DeleteCommand
2022-10-04 15:24:20,907 DEBUG [c.c.h.XenServerGuru] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) We are returning the default host to execute commands because the command is not of Copy type.
2022-10-04 15:24:20,910 DEBUG [c.c.a.t.Request] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Seq 49-3262576455053219576: Sending  { Cmd , MgmtId: 52235250599, via: 49(compute-15.lab.example.com), Ver: v1, Flags: 100011, [{"org.apache.cloudstack.storage.command.DeleteCommand":{"data":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c","volume":{"uuid":"d0126a5a-62c9-4935-b6f8-bba159d6fbae","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5db4c6c5-cc09-4f99-af42-c2256f76748c","name":"sg1","id":"3","poolType":"SharedMountPoint","host":"localhost","path":"/cloudstack/sg1","port":"0","url":"SharedMountPoint://localhost/cloudstack/sg1/?ROLE=Primary&STOREUUID=5db4c6c5-cc09-4f99-af42-c2256f76748c","isManaged":"false"}},"name":"ROOT-817","size":"(100.00 GB) 107374182400","path":"d0126a5a-62c9-4935-b6f8-bba159d6fbae","volumeId":"866","vmName":"i-57-817-VM","accountId":"57","format":"QCOW2","provisioningType":"THIN","poolId":"3","id":"866","deviceId":"0","cacheMode":"NONE","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"},"dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5db4c6c5-cc09-4f99-af42-c2256f76748c","name":"sg1","id":"3","poolType":"SharedMountPoint","host":"localhost","path":"/cloudstack/sg1","port":"0","url":"SharedMountPoint://localhost/cloudstack/sg1/?ROLE=Primary&STOREUUID=5db4c6c5-cc09-4f99-af42-c2256f76748c","isManaged":"false"}},"vmName":"i-57-817-VM","name":"master-0_ROOT-817_20211122044926","hypervisorType":"KVM","id":"82","quiescevm":"false","physicalSize":"0"}},"wait":"0","bypassHostMaintenance":"false"}}] }
2022-10-04 15:24:20,920 DEBUG [c.c.a.t.Request] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Seq 49-3262576455053219576: Received:  { Ans: , MgmtId: 52235250599, via: 49(compute-15.lab.example.com), Ver: v1, Flags: 10, { Answer } }
2022-10-04 15:24:20,920 DEBUG [o.a.c.s.s.SnapshotServiceImpl] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) delete snapshot failedFailed to remove snapshot 7ccea8f2-f393-4db1-9c11-b60a337cb53c
2022-10-04 15:24:20,927 ERROR [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Failed to delete snapshot [SnapshotTO[datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|path/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c]] on storage [primary storage {uuid: "5db4c6c5-cc09-4f99-af42-c2256f76748c", name: "sg1"}] due to [Failed to remove snapshot 7ccea8f2-f393-4db1-9c11-b60a337cb53c].
2022-10-04 15:24:20,933 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Failed to delete Snapshot {"uuid":"44d9b521-861c-4c62-be4a-c1523d4a9950","name":"master-0_ROOT-817_20211122044926","volumeId":866} on primary storage {uuid: "5db4c6c5-cc09-4f99-af42-c2256f76748c", name: "sg1"}.
2022-10-04 15:24:20,944 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-59:ctx-8d8ba987 job-67955) (logid:7aeebf72) Complete async job-67955, jobStatus: FAILED, resultCode: 530, result: org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed to delete snapshot"}
2022-10-04 15:24:20,945 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-59:ctx-8d8ba987 job-67955) (logid:7aeebf72) Publish async job-67955 complete on message bus
2022-10-04 15:24:20,945 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-59:ctx-8d8ba987 job-67955) (logid:7aeebf72) Wake up jobs related to job-67955
2022-10-04 15:24:20,945 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-59:ctx-8d8ba987 job-67955) (logid:7aeebf72) Update db status for job-67955
2022-10-04 15:24:20,946 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-59:ctx-8d8ba987 job-67955) (logid:7aeebf72) Wake up jobs joined with job-67955 and disjoin all subjobs created from job- 67955
2022-10-04 15:24:20,949 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-59:ctx-8d8ba987 job-67955) (logid:7aeebf72) Done executing org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd for job-67955
2022-10-04 15:24:20,949 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-59:ctx-8d8ba987 job-67955) (logid:7aeebf72) Remove job-67955 from job monitoring
levindecaro commented 2 years ago

Snapshot created with 4.17.1 do not have above trouble.

2022-10-04 18:15:08,239 DEBUG [o.a.c.s.s.StorPoolSnapshotStrategy] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) StorpoolSnapshotStrategy.canHandle: snapshot=master-0_ROOT-817_20221004085422, uuid=805029e2-b8a3-4150-b1f9-83b1c4d28835, op=DELETE
2022-10-04 18:15:08,273 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Deleting SnapshotTO[datastore=com.cloud.agent.api.to.NfsTO@ed6480|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|pathsnapshots/57/866/39e58d90-d344-495c-898a-dcfeb78b4ab8] chain of snapshots.
2022-10-04 18:15:08,274 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Snapshot [SnapshotTO[datastore=com.cloud.agent.api.to.NfsTO@ed6480|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|pathsnapshots/57/866/39e58d90-d344-495c-898a-dcfeb78b4ab8]] does not have children; therefore, we will delete it and its parents.
2022-10-04 18:15:08,297 DEBUG [c.c.h.o.r.Ovm3HypervisorGuru] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) getCommandHostDelegation: class org.apache.cloudstack.storage.command.DeleteCommand
2022-10-04 18:15:08,297 DEBUG [c.c.h.XenServerGuru] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) We are returning the default host to execute commands because the command is not of Copy type.
2022-10-04 18:15:08,300 DEBUG [c.c.a.t.Request] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Seq 63-6460413665462978260: Sending  { Cmd , MgmtId: 52235250599, via: 63(s-1855-VM), Ver: v1, Flags: 100011, [{"org.apache.cloudstack.storage.command.DeleteCommand":{"data":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"snapshots/57/866/39e58d90-d344-495c-898a-dcfeb78b4ab8","volume":{"uuid":"d0126a5a-62c9-4935-b6f8-bba159d6fbae","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5db4c6c5-cc09-4f99-af42-c2256f76748c","name":"sg1","id":"3","poolType":"SharedMountPoint","host":"localhost","path":"/cloudstack/sg1","port":"0","url":"SharedMountPoint://localhost/cloudstack/sg1/?ROLE=Primary&STOREUUID=5db4c6c5-cc09-4f99-af42-c2256f76748c","isManaged":"false"}},"name":"ROOT-817","size":"(100.00 GB) 107374182400","path":"d0126a5a-62c9-4935-b6f8-bba159d6fbae","volumeId":"866","vmName":"i-57-817-VM","accountId":"57","format":"QCOW2","provisioningType":"THIN","poolId":"3","id":"866","deviceId":"0","cacheMode":"NONE","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"},"dataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"nfs://staging-nfs.lab.example.com/StagingVol1/cloudstack","_role":"Image","nfsVersion":"4.1"}},"vmName":"i-57-817-VM","name":"master-0_ROOT-817_20221004085422","hypervisorType":"KVM","id":"516","quiescevm":"false","physicalSize":"0"}},"wait":"0","bypassHostMaintenance":"false"}}] }
2022-10-04 18:15:08,385 DEBUG [c.c.a.t.Request] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Seq 63-6460413665462978260: Received:  { Ans: , MgmtId: 52235250599, via: 63(s-1855-VM), Ver: v1, Flags: 10, { Answer } }
2022-10-04 18:15:08,397 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Snapshot {"uuid":"805029e2-b8a3-4150-b1f9-83b1c4d28835","name":"master-0_ROOT-817_20221004085422","volumeId":866} was deleted on secondary storage {uuid: "ec953d8c-7947-448d-9c3c-e3f6d0cf8ba4", name: "nfs://staging-nfs.lab.example.com/StagingVol1/cloudstack"}.
2022-10-04 18:15:08,418 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Deleting SnapshotTO[datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|path/cloudstack/sg1/snapshots/39e58d90-d344-495c-898a-dcfeb78b4ab8] chain of snapshots.
2022-10-04 18:15:08,419 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Snapshot [SnapshotTO[datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|path/cloudstack/sg1/snapshots/39e58d90-d344-495c-898a-dcfeb78b4ab8]] does not have children; therefore, we will delete it and its parents.
2022-10-04 18:15:08,441 DEBUG [c.c.h.o.r.Ovm3HypervisorGuru] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) getCommandHostDelegation: class org.apache.cloudstack.storage.command.DeleteCommand
2022-10-04 18:15:08,441 DEBUG [c.c.h.XenServerGuru] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) We are returning the default host to execute commands because the command is not of Copy type.
2022-10-04 18:15:08,444 DEBUG [c.c.a.t.Request] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Seq 69-3415135892430393916: Sending  { Cmd , MgmtId: 52235250599, via: 69(compute-14.lab.example.com), Ver: v1, Flags: 100011, [{"org.apache.cloudstack.storage.command.DeleteCommand":{"data":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"/cloudstack/sg1/snapshots/39e58d90-d344-495c-898a-dcfeb78b4ab8","volume":{"uuid":"d0126a5a-62c9-4935-b6f8-bba159d6fbae","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5db4c6c5-cc09-4f99-af42-c2256f76748c","name":"sg1","id":"3","poolType":"SharedMountPoint","host":"localhost","path":"/cloudstack/sg1","port":"0","url":"SharedMountPoint://localhost/cloudstack/sg1/?ROLE=Primary&STOREUUID=5db4c6c5-cc09-4f99-af42-c2256f76748c","isManaged":"false"}},"name":"ROOT-817","size":"(100.00 GB) 107374182400","path":"d0126a5a-62c9-4935-b6f8-bba159d6fbae","volumeId":"866","vmName":"i-57-817-VM","accountId":"57","format":"QCOW2","provisioningType":"THIN","poolId":"3","id":"866","deviceId":"0","cacheMode":"NONE","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"},"dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5db4c6c5-cc09-4f99-af42-c2256f76748c","name":"sg1","id":"3","poolType":"SharedMountPoint","host":"localhost","path":"/cloudstack/sg1","port":"0","url":"SharedMountPoint://localhost/cloudstack/sg1/?ROLE=Primary&STOREUUID=5db4c6c5-cc09-4f99-af42-c2256f76748c","isManaged":"false"}},"vmName":"i-57-817-VM","name":"master-0_ROOT-817_20221004085422","hypervisorType":"KVM","id":"516","quiescevm":"false","physicalSize":"0"}},"wait":"0","bypassHostMaintenance":"false"}}] }
2022-10-04 18:15:08,457 DEBUG [c.c.a.t.Request] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Seq 69-3415135892430393916: Received:  { Ans: , MgmtId: 52235250599, via: 69(compute-14.lab.example.com), Ver: v1, Flags: 10, { Answer } }
2022-10-04 18:15:08,466 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Snapshot {"uuid":"805029e2-b8a3-4150-b1f9-83b1c4d28835","name":"master-0_ROOT-817_20221004085422","volumeId":866} was deleted on primary storage {uuid: "5db4c6c5-cc09-4f99-af42-c2256f76748c", name: "sg1"}.
2022-10-04 18:15:08,474 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Updating resource Type = snapshot count for Account = 57 Operation = decreasing Amount = 1
2022-10-04 18:15:08,478 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Updating resource Type = secondary_storage count for Account = 57 Operation = decreasing Amount = (271.04 GB) 291027681366
2022-10-04 18:15:08,489 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Complete async job-67969, jobStatus: SUCCEEDED, resultCode: 0, result: org.apache.cloudstack.api.response.SuccessResponse/null/{"success":"true"}
2022-10-04 18:15:08,489 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Publish async job-67969 complete on message bus
2022-10-04 18:15:08,489 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Wake up jobs related to job-67969
2022-10-04 18:15:08,489 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Update db status for job-67969
2022-10-04 18:15:08,490 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-68:ctx-15c8d372 job-67969 ctx-3473b6bc) (logid:75e4d493) Wake up jobs joined with job-67969 and disjoin all subjobs created from job- 67969
slavkap commented 2 years ago

@levindecaro, can you share everything with logid:7aeebf72 from the agent.log. There should be more detailed information

levindecaro commented 2 years ago

@slavkap

Thanks for your investigation, the snapshot path apparently wrong with leading by the parent image name

2022-10-04 15:24:20,916 INFO  [kvm.storage.KVMStorageProcessor] (agentRequest-Handler-3:null) (logid:7aeebf72) Deleting snapshot (id=82, name=master-0_ROOT-817_20211122044926, path=/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c, storage type=Filesystem) on primary storage
2022-10-04 15:24:20,918 ERROR [kvm.storage.KVMStorageProcessor] (agentRequest-Handler-3:null) (logid:7aeebf72) Failed to remove snapshot 7ccea8f2-f393-4db1-9c11-b60a337cb53c, with exception: com.cloud.utils.exception.CloudRuntimeException: Unable to delete snapshot [SnapshotTO[datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|path/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c]] due to [/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c: Not a directory].
slavkap commented 2 years ago

@levindecaro it's strange that from the last log, the primary storage type on the first line is Filesystem ... then it's SharedMountPoint on the second one. Can you check if you have that snapshot on the CephFS pool? I think the snapshots created before the upgrade aren't on primary storage (because after backup to secondary for some of the pools, CS deletes the snapshot from the primary storage). In CS 4.17, some pools have new functionality, with which you can keep the snapshots on primary storage. In your case (if you confirm that the snapshot isn't on your primary storage), CS tries to delete it first on primary, but it doesn't exist.

levindecaro commented 2 years ago

@slavkap , yes it is the case, when "snapshot.backup.to.secondary=true" the snapshot will be removed right the way after copy complete. The snapshot file already removed by CS from since 4.16.1 (checked from cephfs pool snapshot), and CS tries to delete the snapshot in primary storage which already removed.

slavkap commented 2 years ago

@levindecaro, thanks for confirming! I think @GutoVeronezi could help more here

GutoVeronezi commented 2 years ago

Hello, @levindecaro, could you also share with us the logs of host compute-15.lab.example.com? This was the host that processed the command and returned the error:

2022-10-04 15:24:20,910 DEBUG [c.c.a.t.Request] (API-Job-Executor-59:ctx-8d8ba987 job-67955 ctx-1c0dc776) (logid:7aeebf72) Seq 49-3262576455053219576: Sending  { Cmd , MgmtId: 52235250599, via: 49(compute-15.lab.example.com), Ver: v1, Flags: 100011, [{"org.apache.cloudstack.storage.command.DeleteCommand":{"data":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c","volume":{"uuid":"d0126a5a-62c9-4935-b6f8-bba159d6fbae","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5db4c6c5-cc09-4f99-af42-c2256f76748c","name":"sg1","id":"3","poolType":"SharedMountPoint","host":"localhost","path":"/cloudstack/sg1","port":"0","url":"SharedMountPoint://localhost/cloudstack/sg1/?ROLE=Primary&STOREUUID=5db4c6c5-cc09-4f99-af42-c2256f76748c","isManaged":"false"}},"name":"ROOT-817","size":"(100.00 GB) 107374182400","path":"d0126a5a-62c9-4935-b6f8-bba159d6fbae","volumeId":"866","vmName":"i-57-817-VM","accountId":"57","format":"QCOW2","provisioningType":"THIN","poolId":"3","id":"866","deviceId":"0","cacheMode":"NONE","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"},"dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5db4c6c5-cc09-4f99-af42-c2256f76748c","name":"sg1","id":"3","poolType":"SharedMountPoint","host":"localhost","path":"/cloudstack/sg1","port":"0","url":"SharedMountPoint://localhost/cloudstack/sg1/?ROLE=Primary&STOREUUID=5db4c6c5-cc09-4f99-af42-c2256f76748c","isManaged":"false"}},"vmName":"i-57-817-VM","name":"master-0_ROOT-817_20211122044926","hypervisorType":"KVM","id":"82","quiescevm":"false","physicalSize":"0"}},"wait":"0","bypassHostMaintenance":"false"}}] }
levindecaro commented 2 years ago

@GutoVeronezi

Which logs besides the agent.log? I only found these lines between 15:20 to 15:29.

2022-10-04 15:24:20,916 INFO  [kvm.storage.KVMStorageProcessor] (agentRequest-Handler-3:null) (logid:7aeebf72) Deleting snapshot (id=82, name=master-0_ROOT-817_20211122044926, path=/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c, storage type=Filesystem) on primary storage
2022-10-04 15:24:20,918 ERROR [kvm.storage.KVMStorageProcessor] (agentRequest-Handler-3:null) (logid:7aeebf72) Failed to remove snapshot 7ccea8f2-f393-4db1-9c11-b60a337cb53c, with exception: com.cloud.utils.exception.CloudRuntimeException: Unable to delete snapshot [SnapshotTO[datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]|volume=volumeTO[uuid=d0126a5a-62c9-4935-b6f8-bba159d6fbae|path=d0126a5a-62c9-4935-b6f8-bba159d6fbae|datastore=PrimaryDataStoreTO[uuid=5db4c6c5-cc09-4f99-af42-c2256f76748c|name=sg1|id=3|pooltype=SharedMountPoint]]|path/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c]] due to [/cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c: Not a directory].

Other than the above log, it has the following lines in /var/log/messages and agent.log, however i think it is not essential.

Oct  5 15:20:24 compute-15 java: INFO  [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-4:) (logid:01bd8d55) Trying to fetch storage pool 5fa62cb4-d22a-4d06-a748-2fb7a73234a2 from libvirt

If it is not sufficient, i could turn on debug level to reproduce again, but please kindly advise specific area to capture.

Thank you

GutoVeronezi commented 2 years ago

@levindecaro, thanks. This process does not have logs in debug; therefore it will not be necessary to turn on debug. I'll take a look in the information you provided and in the code to try to find the root cause.

(@levindecaro, sorry, I scrolled fast and did not see https://github.com/apache/cloudstack/issues/6804#issuecomment-1267187101)

GutoVeronezi commented 2 years ago

@levindecaro, disk snapshots before 4.17 were stored inside the disk (they were internal snapshots). Could you run qemu-img snapshot -l <disk-path> to we list the disk internal snapshots?

levindecaro commented 2 years ago

The image snapshot list of the one reported error shown empty output, because all snapshot are deleted from UI already.

[root@compute-12 ~]# qemu-img snapshot -l /cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae
[root@compute-12 ~]#

Another volume snapshot taken in 4.16.1 output shown as below. These tag are referencing "VM Snapshot"

[root@compute-12 ~]# qemu-img snapshot -l /cloudstack/sg1/5f5bc454-f5e8-42b3-b3eb-141237d56966
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         i-57-819-VM_VS_20220317120620    14G 2022-03-17 20:06:21 1075:54:38.764
2         i-57-819-VM_VS_20220716151306    14G 2022-07-16 23:13:07 1464:36:20.105

image

image

The third sample has volume snapshot only, do not have snapshot meta data

[root@compute-15 ~]# qemu-img snapshot -l /cloudstack/sg1/e8584012-f806-4511-b892-f1cbadcd8b1d
[root@compute-15 ~]#

image

DaanHoogland commented 2 years ago

@levindecaro @GutoVeronezi is there a workaround for this? or do we have a way to fix?

levindecaro commented 2 years ago

@DaanHoogland

At this moment, the impact of this issue in my environment can be discarded, since it just delete something dosen't exist. The user can proceed the deletion by 2nd retry.

However, I will observe this problem will be more serious for same upgrade path when "snapshot.backup.to.secondary=false". ACS likely can't delete the invalid path volume snapshot and left the primarystorage inconsistent with database record, however I don't have another environment to verify this observation.

for my deletion fail case, /cloudstack/sg1/d0126a5a-62c9-4935-b6f8-bba159d6fbae/7ccea8f2-f393-4db1-9c11-b60a337cb53c

d0126a5a-62c9-4935-b6f8-bba159d6fbae actually a image file, not a directory. 7ccea8f2-f393-4db1-9c11-b60a337cb53c should be same level of d0126a5a-62c9-4935-b6f8-bba159d6fbae when snapshot.backup.to.secondary=false or 7ccea8f2-f393-4db1-9c11-b60a337cb53c copied to secondary storage when snapshot.backup.to.secondary=true

GutoVeronezi commented 2 years ago

@levindecaro @DaanHoogland

When taking a snapshot with ACS prior to 4.17, it would set the snapshot path as disk.getPath() + File.separator + snapshotName. Then, when deleting the snapshot, ACS would use the disk path and the snapshot name to call the script managesnapshot (which calls $qemu_img snapshot -l $disk |tail -n +3|awk '{print $1}'|xargs -I {} $qemu_img snapshot -d {} $disk >&2).

When taking a snapshot in 4.17, the snapshot path is created with the primary storage path + '/snapshot/' + snapshot name. When deleting the snapshot, ACS will try to remove the file in that path. As the old snapshot path format was not a real path, in 4.17, it throws the exception Not a directory, as seen in the logs.

One option is to mark the primary storage reference (in snapshot_store_ref) as removed for the snapshots taken before upgrading to 4.17, so ACS would not unnecessary try to delete them in the primary storage.

nvazquez commented 2 years ago

Hi guys, just 2 cents on this issue - I could also reproduce it using NFS storage and qemu-kvm packages instead of qemu-kvm-ev after upgrading from 4.16.1.1:

First attempt:

2022-10-11 17:15:43,266 DEBUG [c.c.a.t.Request] (API-Job-Executor-1:ctx-891d3141 job-53 ctx-e5d41064) (logid:972cc1b3) Seq 3-368169269537538057: Sending  { Cmd , MgmtId: 167781181, via: 3(s-2-VM), Ver: v1, Flags: 100011, [{"org.apache.cloudstack.storage.command.DeleteCommand":{"data":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"snapshots/2/5/60e97a3c-d66f-44ee-bdef-c765d0895070","volume":{"uuid":"9f44a962-c4de-4298-ae0d-323a92556a02","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"db33bd91-fc29-3d98-886e-1ca265d5ad71","name":"ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1","id":"1","poolType":"NetworkFilesystem","host":"10.0.32.4","path":"/acs/primary/ref-trl-3728-k-Mr8-nicolas-vazquez/ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1","port":"2049","url":"NetworkFilesystem://10.0.32.4/acs/primary/ref-trl-3728-k-Mr8-nicolas-vazquez/ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1/?ROLE=Primary&STOREUUID=db33bd91-fc29-3d98-886e-1ca265d5ad71","isManaged":"false"}},"name":"ROOT-5","size":"(8.00 GB) 8589934592","path":"9f44a962-c4de-4298-ae0d-323a92556a02","volumeId":"5","vmName":"i-2-5-VM","accountId":"2","format":"QCOW2","provisioningType":"THIN","poolId":"1","id":"5","deviceId":"0","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"},"dataStore":{"com.cloud.agent.api.to.NfsTO":{"_url":"NFS://10.0.32.4/acs/secondary/ref-trl-3728-k-Mr8-nicolas-vazquez/ref-trl-3728-k-Mr8-nicolas-vazquez-sec1","_role":"Image"}},"vmName":"i-2-5-VM","name":"volsnap3","hypervisorType":"KVM","id":"4","quiescevm":"false","physicalSize":"0"}},"wait":"0","bypassHostMaintenance":"false"}}] }
2022-10-11 17:15:43,363 DEBUG [c.c.a.ApiServlet] (qtp396883763-19:ctx-7c4c567b) (logid:5a61ae94) ===START===  10.0.3.251 -- GET  jobId=972cc1b3-cfab-48c9-b7a2-f317c8a117e4&command=queryAsyncJobResult&response=json
2022-10-11 17:15:43,375 DEBUG [c.c.a.ApiServer] (qtp396883763-19:ctx-7c4c567b ctx-2e032dfb) (logid:5a61ae94) CIDRs from which account 'Acct[a7b00689-497a-11ed-8541-1e00100003d0-admin] -- Account {"id": 2, "name": "admin", "uuid": "a7b00689-497a-11ed-8541-1e00100003d0"}' is allowed to perform API calls: 0.0.0.0/0,::/0
2022-10-11 17:15:43,400 DEBUG [c.c.a.ApiServlet] (qtp396883763-19:ctx-7c4c567b ctx-2e032dfb) (logid:5a61ae94) ===END===  10.0.3.251 -- GET  jobId=972cc1b3-cfab-48c9-b7a2-f317c8a117e4&command=queryAsyncJobResult&response=json
2022-10-11 17:15:44,117 DEBUG [o.a.c.h.HAManagerImpl] (BackgroundTaskPollManager-6:ctx-0e27659d) (logid:c2272697) HA health check task is running...
2022-10-11 17:15:45,079 DEBUG [c.c.a.t.Request] (AgentManager-Handler-1:null) (logid:) Seq 3-368169269537538057: Processing:  { Ans: , MgmtId: 167781181, via: 3, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":"true","wait":"0","bypassHostMaintenance":"false"}}] }
2022-10-11 17:15:45,080 DEBUG [c.c.a.t.Request] (API-Job-Executor-1:ctx-891d3141 job-53 ctx-e5d41064) (logid:972cc1b3) Seq 3-368169269537538057: Received:  { Ans: , MgmtId: 167781181, via: 3(s-2-VM), Ver: v1, Flags: 10, { Answer } }
2022-10-11 17:15:45,097 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-1:ctx-891d3141 job-53 ctx-e5d41064) (logid:972cc1b3) Snapshot {"uuid":"efe1e43e-1819-43ff-9fa6-268c571ce5ce","name":"volsnap3","volumeId":5} was deleted on secondary storage {uuid: "f618d895-4b01-4b26-bf08-f2e142074341", name: "NFS://10.0.32.4/acs/secondary/ref-trl-3728-k-Mr8-nicolas-vazquez/ref-trl-3728-k-Mr8-nicolas-vazquez-sec1"}.
2022-10-11 17:15:45,137 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-1:ctx-891d3141 job-53 ctx-e5d41064) (logid:972cc1b3) Deleting SnapshotTO[datastore=PrimaryDataStoreTO[uuid=db33bd91-fc29-3d98-886e-1ca265d5ad71|name=ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1|id=1|pooltype=NetworkFilesystem]|volume=volumeTO[uuid=9f44a962-c4de-4298-ae0d-323a92556a02|path=9f44a962-c4de-4298-ae0d-323a92556a02|datastore=PrimaryDataStoreTO[uuid=db33bd91-fc29-3d98-886e-1ca265d5ad71|name=ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1|id=1|pooltype=NetworkFilesystem]]|path/mnt/db33bd91-fc29-3d98-886e-1ca265d5ad71/9f44a962-c4de-4298-ae0d-323a92556a02/60e97a3c-d66f-44ee-bdef-c765d0895070] chain of snapshots.
2022-10-11 17:15:45,138 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-1:ctx-891d3141 job-53 ctx-e5d41064) (logid:972cc1b3) Snapshot [SnapshotTO[datastore=PrimaryDataStoreTO[uuid=db33bd91-fc29-3d98-886e-1ca265d5ad71|name=ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1|id=1|pooltype=NetworkFilesystem]|volume=volumeTO[uuid=9f44a962-c4de-4298-ae0d-323a92556a02|path=9f44a962-c4de-4298-ae0d-323a92556a02|datastore=PrimaryDataStoreTO[uuid=db33bd91-fc29-3d98-886e-1ca265d5ad71|name=ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1|id=1|pooltype=NetworkFilesystem]]|path/mnt/db33bd91-fc29-3d98-886e-1ca265d5ad71/9f44a962-c4de-4298-ae0d-323a92556a02/60e97a3c-d66f-44ee-bdef-c765d0895070]] does not have children; therefore, we will delete it and its parents.
2022-10-11 17:15:45,186 DEBUG [c.c.h.o.r.Ovm3HypervisorGuru] (API-Job-Executor-1:ctx-891d3141 job-53 ctx-e5d41064) (logid:972cc1b3) getCommandHostDelegation: class org.apache.cloudstack.storage.command.DeleteCommand
2022-10-11 17:15:45,186 DEBUG [c.c.h.XenServerGuru] (API-Job-Executor-1:ctx-891d3141 job-53 ctx-e5d41064) (logid:972cc1b3) We are returning the default host to execute commands because the command is not of Copy type.
2022-10-11 17:15:45,191 DEBUG [c.c.a.t.Request] (API-Job-Executor-1:ctx-891d3141 job-53 ctx-e5d41064) (logid:972cc1b3) Seq 1-7690177838711832591: Sending  { Cmd , MgmtId: 167781181, via: 1(ref-trl-3728-k-Mr8-nicolas-vazquez-kvm1), Ver: v1, Flags: 100011, [{"org.apache.cloudstack.storage.command.DeleteCommand":{"data":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"path":"/mnt/db33bd91-fc29-3d98-886e-1ca265d5ad71/9f44a962-c4de-4298-ae0d-323a92556a02/60e97a3c-d66f-44ee-bdef-c765d0895070","volume":{"uuid":"9f44a962-c4de-4298-ae0d-323a92556a02","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"db33bd91-fc29-3d98-886e-1ca265d5ad71","name":"ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1","id":"1","poolType":"NetworkFilesystem","host":"10.0.32.4","path":"/acs/primary/ref-trl-3728-k-Mr8-nicolas-vazquez/ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1","port":"2049","url":"NetworkFilesystem://10.0.32.4/acs/primary/ref-trl-3728-k-Mr8-nicolas-vazquez/ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1/?ROLE=Primary&STOREUUID=db33bd91-fc29-3d98-886e-1ca265d5ad71","isManaged":"false"}},"name":"ROOT-5","size":"(8.00 GB) 8589934592","path":"9f44a962-c4de-4298-ae0d-323a92556a02","volumeId":"5","vmName":"i-2-5-VM","accountId":"2","format":"QCOW2","provisioningType":"THIN","poolId":"1","id":"5","deviceId":"0","hypervisorType":"KVM","directDownload":"false","deployAsIs":"false"},"dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"db33bd91-fc29-3d98-886e-1ca265d5ad71","name":"ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1","id":"1","poolType":"NetworkFilesystem","host":"10.0.32.4","path":"/acs/primary/ref-trl-3728-k-Mr8-nicolas-vazquez/ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1","port":"2049","url":"NetworkFilesystem://10.0.32.4/acs/primary/ref-trl-3728-k-Mr8-nicolas-vazquez/ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1/?ROLE=Primary&STOREUUID=db33bd91-fc29-3d98-886e-1ca265d5ad71","isManaged":"false"}},"vmName":"i-2-5-VM","name":"volsnap3","hypervisorType":"KVM","id":"4","quiescevm":"false","physicalSize":"0"}},"wait":"0","bypassHostMaintenance":"false"}}] }
2022-10-11 17:15:45,218 DEBUG [c.c.a.t.Request] (AgentManager-Handler-2:null) (logid:) Seq 1-7690177838711832591: Processing:  { Ans: , MgmtId: 167781181, via: 1, Ver: v1, Flags: 10, [{"com.cloud.agent.api.Answer":{"result":"false","details":"Failed to remove snapshot 60e97a3c-d66f-44ee-bdef-c765d0895070","wait":"0","bypassHostMaintenance":"false"}}] }
2022-10-11 17:15:45,220 DEBUG [c.c.a.t.Request] (API-Job-Executor-1:ctx-891d3141 job-53 ctx-e5d41064) (logid:972cc1b3) Seq 1-7690177838711832591: Received:  { Ans: , MgmtId: 167781181, via: 1(ref-trl-3728-k-Mr8-nicolas-vazquez-kvm1), Ver: v1, Flags: 10, { Answer } }
2022-10-11 17:15:45,221 DEBUG [o.a.c.s.s.SnapshotServiceImpl] (API-Job-Executor-1:ctx-891d3141 job-53 ctx-e5d41064) (logid:972cc1b3) delete snapshot failedFailed to remove snapshot 60e97a3c-d66f-44ee-bdef-c765d0895070
2022-10-11 17:15:45,233 ERROR [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-1:ctx-891d3141 job-53 ctx-e5d41064) (logid:972cc1b3) Failed to delete snapshot [SnapshotTO[datastore=PrimaryDataStoreTO[uuid=db33bd91-fc29-3d98-886e-1ca265d5ad71|name=ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1|id=1|pooltype=NetworkFilesystem]|volume=volumeTO[uuid=9f44a962-c4de-4298-ae0d-323a92556a02|path=9f44a962-c4de-4298-ae0d-323a92556a02|datastore=PrimaryDataStoreTO[uuid=db33bd91-fc29-3d98-886e-1ca265d5ad71|name=ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1|id=1|pooltype=NetworkFilesystem]]|path/mnt/db33bd91-fc29-3d98-886e-1ca265d5ad71/9f44a962-c4de-4298-ae0d-323a92556a02/60e97a3c-d66f-44ee-bdef-c765d0895070]] on storage [primary storage {uuid: "db33bd91-fc29-3d98-886e-1ca265d5ad71", name: "ref-trl-3728-k-Mr8-nicolas-vazquez-kvm-pri1"}] due to [Failed to remove snapshot 60e97a3c-d66f-44ee-bdef-c765d0895070].
com.cloud.utils.exception.CloudRuntimeException: Failed to remove snapshot 60e97a3c-d66f-44ee-bdef-c765d0895070
    at org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.deleteSnapshot(SnapshotServiceImpl.java:419)
    at org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.deleteSnapshotChain(DefaultSnapshotStrategy.java:218)
    at org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.deleteSnapshotInfo(DefaultSnapshotStrategy.java:331)
    at org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.deleteSnapshotInfos(DefaultSnapshotStrategy.java:306)
    at org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.destroySnapshotEntriesAndFiles(DefaultSnapshotStrategy.java:285)
    at org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.deleteSnapshot(DefaultSnapshotStrategy.java:277)
    at com.cloud.storage.snapshot.SnapshotManagerImpl.deleteSnapshot(SnapshotManagerImpl.java:607)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

Second attempt:

2022-10-11 17:15:56,162 DEBUG [c.c.a.ApiServlet] (qtp396883763-18:ctx-83fe3d1e ctx-5f0c22aa) (logid:9a0603b7) ===END===  10.0.3.251 -- GET  id=efe1e43e-1819-43ff-9fa6-268c571ce5ce&command=deleteSnapshot&response=json
2022-10-11 17:15:56,164 INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-2:ctx-88542e84 job-56) (logid:a4c66950) Add job-56 into job monitoring
2022-10-11 17:15:56,169 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:ctx-88542e84 job-56) (logid:954427aa) Executing AsyncJobVO: {id:56, userId: 2, accountId: 2, instanceType: Snapshot, instanceId: 4, cmd: org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd, cmdInfo: {"response":"json","ctxUserId":"2","httpmethod":"GET","ctxStartEventId":"127","id":"efe1e43e-1819-43ff-9fa6-268c571ce5ce","ctxDetails":"{\"interface com.cloud.storage.Snapshot\":\"efe1e43e-1819-43ff-9fa6-268c571ce5ce\"}","ctxAccountId":"2","uuid":"efe1e43e-1819-43ff-9fa6-268c571ce5ce","cmdEventType":"SNAPSHOT.DELETE"}, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 167781181, completeMsid: null, lastUpdated: null, lastPolled: null, created: null, removed: null}
2022-10-11 17:15:56,194 DEBUG [o.a.c.s.s.StorPoolSnapshotStrategy] (API-Job-Executor-2:ctx-88542e84 job-56 ctx-b34fc88d) (logid:954427aa) StorpoolSnapshotStrategy.canHandle: snapshot=volsnap3, uuid=efe1e43e-1819-43ff-9fa6-268c571ce5ce, op=DELETE
2022-10-11 17:15:56,202 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-2:ctx-88542e84 job-56 ctx-b34fc88d) (logid:954427aa) Could not find Snapshot {"uuid":"efe1e43e-1819-43ff-9fa6-268c571ce5ce","name":"volsnap3","volumeId":5} entry on a secondary storage. Skipping deletion on secondary storage.
2022-10-11 17:15:56,202 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy] (API-Job-Executor-2:ctx-88542e84 job-56 ctx-b34fc88d) (logid:954427aa) Could not find Snapshot {"uuid":"efe1e43e-1819-43ff-9fa6-268c571ce5ce","name":"volsnap3","volumeId":5} entry on a primary storage. Skipping deletion on primary storage.
2022-10-11 17:15:56,218 DEBUG [c.c.r.ResourceLimitManagerImpl] (API-Job-Executor-2:ctx-88542e84 job-56 ctx-b34fc88d) (logid:954427aa) Updating resource Type = snapshot count for Account = 2 Operation = decreasing Amount = 1
2022-10-11 17:15:56,239 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:ctx-88542e84 job-56 ctx-b34fc88d) (logid:954427aa) Complete async job-56, jobStatus: SUCCEEDED, resultCode: 0, result: org.apache.cloudstack.api.response.SuccessResponse/null/{"success":"true"}
2022-10-11 17:15:56,240 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:ctx-88542e84 job-56 ctx-b34fc88d) (logid:954427aa) Publish async job-56 complete on message bus
2022-10-11 17:15:56,240 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:ctx-88542e84 job-56 ctx-b34fc88d) (logid:954427aa) Wake up jobs related to job-56
2022-10-11 17:15:56,240 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:ctx-88542e84 job-56 ctx-b34fc88d) (logid:954427aa) Update db status for job-56
2022-10-11 17:15:56,241 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:ctx-88542e84 job-56 ctx-b34fc88d) (logid:954427aa) Wake up jobs joined with job-56 and disjoin all subjobs created from job- 56
2022-10-11 17:15:56,245 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (API-Job-Executor-2:ctx-88542e84 job-56) (logid:954427aa) Done executing org.apache.cloudstack.api.command.user.snapshot.DeleteSnapshotCmd for job-56
DaanHoogland commented 7 months ago

@levindecaro , as there is a workaround and this seems to be an old issue, I am closing it. Please reopen or create a new one if there still is a problem.