LINBIT / linstor-proxmox

Integration pluging bridging LINSTOR to Proxmox VE
31 stars 7 forks source link

[Linstor-Proxmox] No cleanup after clone #51

Closed cr1ll3 closed 2 years ago

cr1ll3 commented 2 years ago

Hello,

After making a clone of a container snapshot there are snapshots resources/resource-definitions/volumes left behind making previous snapshots unable to be removed unless removing the snapshot residues manually.

Status before clone

#linstor r list
╭───────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName  ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊      State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db    ┊ pve2-1 ┊ 7000 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-06-29 12:25:31 ┊
┊ linstor_db    ┊ pve2-2 ┊ 7000 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2022-06-29 12:25:31 ┊
┊ linstor_db    ┊ pve2-3 ┊ 7000 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-06-29 12:25:31 ┊
┊ vm-100-disk-1 ┊ pve2-1 ┊ 7001 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2022-06-29 21:54:57 ┊
┊ vm-100-disk-1 ┊ pve2-2 ┊ 7001 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-06-29 21:55:05 ┊
┊ vm-100-disk-1 ┊ pve2-3 ┊ 7001 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-06-29 21:55:05 ┊
╰───────────────────────────────────────────────────────────────────────────────────╯

#linstor rd list
╭──────────────────────────────────────────────╮
┊ ResourceName  ┊ Port ┊ ResourceGroup ┊ State ┊
╞══════════════════════════════════════════════╡
┊ linstor_db    ┊ 7000 ┊ DfltRscGrp    ┊ ok    ┊
┊ vm-100-disk-1 ┊ 7001 ┊ RG_thin       ┊ ok    ┊
╰──────────────────────────────────────────────╯

#linstor v list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource      ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  Allocated ┊ InUse  ┊      State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pve2-1 ┊ linstor_db    ┊ fatPool              ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊   1.36 MiB ┊ Unused ┊   UpToDate ┊
┊ pve2-2 ┊ linstor_db    ┊ fatPool              ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊   1.36 MiB ┊ InUse  ┊   UpToDate ┊
┊ pve2-3 ┊ linstor_db    ┊ fatPool              ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊   1.36 MiB ┊ Unused ┊   UpToDate ┊
┊ pve2-1 ┊ vm-100-disk-1 ┊ thinPool             ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 100.02 GiB ┊ InUse  ┊   UpToDate ┊
┊ pve2-2 ┊ vm-100-disk-1 ┊ DfltDisklessStorPool ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊            ┊ Unused ┊ TieBreaker ┊
┊ pve2-3 ┊ vm-100-disk-1 ┊ thinPool             ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 100.02 GiB ┊ Unused ┊   UpToDate ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

After cloning a snapshot from the running container im left with:

#linstor r list
╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName                     ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊      State ┊ CreatedOn           ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor_db                       ┊ pve2-1 ┊ 7000 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-06-29 12:25:31 ┊
┊ linstor_db                       ┊ pve2-2 ┊ 7000 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2022-06-29 12:25:31 ┊
┊ linstor_db                       ┊ pve2-3 ┊ 7000 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-06-29 12:25:31 ┊
┊ snap_vm-100-disk-1_snapshot_demo ┊ pve2-1 ┊ 7002 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-06-30 11:20:27 ┊
┊ snap_vm-100-disk-1_snapshot_demo ┊ pve2-2 ┊ 7002 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-06-30 11:20:20 ┊
┊ snap_vm-100-disk-1_snapshot_demo ┊ pve2-3 ┊ 7002 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-06-30 11:20:27 ┊
┊ vm-100-disk-1                    ┊ pve2-1 ┊ 7001 ┊ InUse  ┊ Ok    ┊   UpToDate ┊ 2022-06-29 21:54:57 ┊
┊ vm-100-disk-1                    ┊ pve2-2 ┊ 7001 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2022-06-29 21:55:05 ┊
┊ vm-100-disk-1                    ┊ pve2-3 ┊ 7001 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-06-29 21:55:05 ┊
┊ vm-101-disk-1                    ┊ pve2-1 ┊ 7003 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-06-30 11:20:35 ┊
┊ vm-101-disk-1                    ┊ pve2-2 ┊ 7003 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2022-06-30 11:20:43 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯

#linstor rd list
╭─────────────────────────────────────────────────────────────────╮
┊ ResourceName                     ┊ Port ┊ ResourceGroup ┊ State ┊
╞═════════════════════════════════════════════════════════════════╡
┊ linstor_db                       ┊ 7000 ┊ DfltRscGrp    ┊ ok    ┊
┊ snap_vm-100-disk-1_snapshot_demo ┊ 7002 ┊ DfltRscGrp    ┊ ok    ┊
┊ vm-100-disk-1                    ┊ 7001 ┊ RG_thin       ┊ ok    ┊
┊ vm-101-disk-1                    ┊ 7003 ┊ RG_thin       ┊ ok    ┊
╰─────────────────────────────────────────────────────────────────╯

#linstor v list
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource                         ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊  Allocated ┊ InUse  ┊      State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ pve2-1 ┊ linstor_db                       ┊ fatPool              ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊   1.36 MiB ┊ Unused ┊   UpToDate ┊
┊ pve2-2 ┊ linstor_db                       ┊ fatPool              ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊   1.36 MiB ┊ InUse  ┊   UpToDate ┊
┊ pve2-3 ┊ linstor_db                       ┊ fatPool              ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊   1.36 MiB ┊ Unused ┊   UpToDate ┊
┊ pve2-1 ┊ snap_vm-100-disk-1_snapshot_demo ┊ thinPool             ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 100.02 GiB ┊ Unused ┊   UpToDate ┊
┊ pve2-2 ┊ snap_vm-100-disk-1_snapshot_demo ┊ DfltDisklessStorPool ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊            ┊ Unused ┊ TieBreaker ┊
┊ pve2-3 ┊ snap_vm-100-disk-1_snapshot_demo ┊ thinPool             ┊     0 ┊    1002 ┊ /dev/drbd1002 ┊ 100.02 GiB ┊ Unused ┊   UpToDate ┊
┊ pve2-1 ┊ vm-100-disk-1                    ┊ thinPool             ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 100.02 GiB ┊ InUse  ┊   UpToDate ┊
┊ pve2-2 ┊ vm-100-disk-1                    ┊ DfltDisklessStorPool ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊            ┊ Unused ┊ TieBreaker ┊
┊ pve2-3 ┊ vm-100-disk-1                    ┊ thinPool             ┊     0 ┊    1001 ┊ /dev/drbd1001 ┊ 100.02 GiB ┊ Unused ┊   UpToDate ┊
┊ pve2-1 ┊ vm-101-disk-1                    ┊ thinPool             ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 100.02 GiB ┊ Unused ┊   UpToDate ┊
┊ pve2-2 ┊ vm-101-disk-1                    ┊ thinPool             ┊     0 ┊    1003 ┊ /dev/drbd1003 ┊ 100.02 GiB ┊ Unused ┊   UpToDate ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
rck commented 2 years ago

thanks for reporting that, you are right, I'm just not sure if we can do a lot about it.

PVE wants to "activate" a volume from the snapshot and then it looks like it copies the data from that volume to the new disk. But from a plugin perspective I'm not sure if I ever get a "callback" that we can deactivate that temporary snapshot resource. What I see is

activate_volume: snap; restore(vm-100-disk-1, snap_vm-100-disk-1_snap1, snap_vm-100-disk-1_snap1) # where we create that temporary resource
alloc_image: vm-101-disk-1 # the new/fresh disk for the clone
map_volume: vm-101-disk-1
map_volume: snap_vm-100-disk-1_snap1 # most likely where we get asked for the source of the copy action
map_volume: vm-101-disk-1
map_volume: vm-101-disk-1
map_volume: vm-101-disk-1
deactivate_volume: vm-101-disk-1,

The only thing right now wold be to ignore the creation of snap_vm... if it already exists. That avoids any followup errors, but never cleans up that temporary resource.

I have to dive more into the Proxmox code to see if there is some hook where I could try to delete the temporary resource. Will take some time.

rck commented 2 years ago

just taking notes here... what we might be able to do:

in between it lingers around, but that would at least be an improvement. Still, there might be better alternatives after taking a deeper look into the PVE code

rck commented 2 years ago

that does not look too bad, can you try that one? guess that is the best we can do:

diff --git a/LINSTORPlugin.pm b/LINSTORPlugin.pm
index b721140..3bf51f5 100644
--- a/LINSTORPlugin.pm
+++ b/LINSTORPlugin.pm
@@ -422,8 +422,13 @@ sub activate_volume {
     if ($snap) {    # need to create this resource from snapshot
         my $snapname = volname_and_snap_to_snapname( $volname, $snap );
         my $new_volname = $snapname;
-        eval { linstor($scfg)->restore_snapshot( $volname, $snapname, $new_volname ); };
-        confess $@ if $@;
+        my $lsc = linstor($scfg);
+        if ( !$lsc->resource_exists($new_volname) ) {
+            eval {
+                $lsc->restore_snapshot( $volname, $snapname, $new_volname );
+            };
+            confess $@ if $@;
+        }
         $volname = $new_volname; # for the rest of this function switch the name
     }

@@ -506,10 +511,10 @@ sub volume_snapshot_delete {
     # on backup we created a resource from the given snapshot
     # on cleanup we as plugin only get a volume_snapshot_delete
     # so we have to do some "heuristic" to also clean up the resource we created
-    if ( $snap eq 'vzdump' ) {
-        eval { $lsc->delete_resource( $snapname ); };
-        confess $@ if $@;
-    }
+    # backup would be: if ( $snap eq 'vzdump' )
+    # but we also want to delete "tempoarary snapshot resources" when they got activated via a clone
+    eval { $lsc->delete_resource( $snapname ); };
+    confess $@ if $@;

     eval { $lsc->delete_snapshot( $volname, $snapname ); };
     confess $@ if $@;
cr1ll3 commented 2 years ago

Hello, thank you for swift response. Seems to be a pretty good solution/compromise without having to rummage in the proxmox codebase. I will apply the patch and do some testing and see if I can find any issues!

rck commented 2 years ago

should be fixed in https://github.com/LINBIT/linstor-proxmox/commit/aa2871825aa1cdc8e7958d59bebee055c426a6dc . The commit is included in v6.0.1-rc.1, the final should be out in about a week from now.