Closed nischalnischal2020 closed 3 weeks ago
@nischalnischal2020 did you check the nfs share? and was the file there? what was the state of the file if it was? Did you find any errors in the management server logs or the agents logs for the SSVM and the host involved?
HI @DaanHoogland
There was no file in the secondary storage, besides this we use the Global parameter = "snapshot.backup.to.secondary = False" So snapshot files would remain in Primary Storage.
The logs show the error as
2024-03-22 15:47:49,643 DEBUG c.c.a.t.Request (logid:) Seq 19-8222165544694979585: Processing: { Ans: , MgmtId: 195808829246451, via: 19, Ver: v1, Flags: 10, [{"org.apache.cloudstack.storage.command.CopyCmdAnswer":{"result":"false","details":"org.apache.cloudstack.utils.qemu.QemuImgException: qemu-img: Could not open 'rbd:arch-int-vpc-prim/fe23b82c-e8c4-4a14-a4dc-6ea3d54a6c55@db6e00a4-882d-4ad5-b827-b1db5f1bb9e6:mon_host=172.20.202.10:auth_supported=cephx:id=stackusr:key=AQBGk7Fi8IrUDBAA2qvfs+QVVYJ0Ri8jAk7Hiw==:rbd_default_format=2:client_mount_timeout=30': error reading header from fe23b82c-e8c4-4a14-a4dc-6ea3d54a6c55: No such file or directory","wait":"0","bypassHostMaintenance":"false"}}] } 2024-03-22 15:47:49,643 DEBUG [c.c.a.t.Request] (API-Job-Executor-73:ctx-b37e560d job-63965 ctx-7185241a) (logid:606af370) Seq 19-8222165544694979585: Received: { Ans: , MgmtId: 195808829246451, via: 19(SBARCLD-INT-VPC5), Ver: v1, Flags: 10, { CopyCmdAnswer } } 2024-03-22 15:47:49,644 DEBUG [o.a.c.s.s.SnapshotServiceImpl] (API-Job-Executor-73:ctx-b37e560d job-63965 ctx-7185241a) (logid:606af370) Failed to copy snapshot java.lang.RuntimeException: InvocationTargetException when invoking RPC callback for command: copySnapshotAsyncCallback at org.apache.cloudstack.framework.async.AsyncCallbackDispatcher.dispatch(AsyncCallbackDispatcher.java:154) at org.apache.cloudstack.framework.async.InplaceAsyncCallbackDriver.performCompletionCallback(InplaceAsyncCallbackDriver.java:25) at org.apache.cloudstack.framework.async.AsyncCallbackDispatcher.complete(AsyncCallbackDispatcher.java:126) at org.apache.cloudstack.storage.motion.AncientDataMotionStrategy.copyAsync(AncientDataMotionStrategy.java:534) at org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:84) at org.apache.cloudstack.storage.motion.DataMotionServiceImpl.copyAsync(DataMotionServiceImpl.java:106) at org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:283) at org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:177) at org.apache.cloudstack.snapshot.SnapshotHelper.backupSnapshotToSecondaryStorageIfNotExists(SnapshotHelper.java:134) at com.cloud.template.TemplateManagerImpl.createPrivateTemplate(TemplateManagerImpl.java:1644) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
And was there a file on primary storage?
also , in your stacktrace it says TemplateManagerImpl.createPrivateTemplate
so if a template is to be created from a snapshot or a volume it would always have to be copied to secondary to do so.
@nischalnischal2020 please check the newly created PR #9239 to address an issue which I've observed while checking your issue here.
The issue I've observed is not while taking the snapshot but while creating the template from the snapshot (stack trace also refers the same).
I could not reproduce the original issue of failed snapshot showing as backedup state rather than error (it might have already fixed after 4.17.2), but I saw another serious issue.
The issue is whenever a snapshot is used to create a template or volume and if there is failure in backing up the snapshot to the secondary store and as part of handling that failure MS is deleting the snapshot in primary storage itself.
These changes are introduced as part of the PR https://github.com/apache/cloudstack/pull/5297
Create a snapshot of a volume (set snapshot.backup.to.secondary = False)
Create a template from that snapshot
As part of the creation, MS first tries to backup the snapshot to the secondary storage
I've made it fail
MS recognized the failure and as part of failure it is deleting the snapshot on the primary storage (also marking the snapshot_store_ref entry for primary store role as "Destroyed")
Part addressed in https://github.com/apache/cloudstack/pull/9239 pl check and close the ticket, cc @nischalnischal2020 @harikrishna-patnala
Closing the issue as the related PR is merged
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
OS / ENVIRONMENT
ACS 4.17.2 with Ceph storge and Global Config "snapshot.backup.to.secondary = False"
SUMMARY
There was an error on the NFS server as the snapshot was being taken, the NFS server rebooted during snapshot process, The issue was that the state of the snapshot was shown as "Creating"
STEPS TO REPRODUCE
EXPECTED RESULTS
ACTUAL RESULTS