NetApp / trident

Storage orchestrator for containers
Apache License 2.0
762 stars 222 forks source link

Ability to perform clones/snapshots of Volumes that Trident won't try to delete #810

Open RuairiSinkler opened 1 year ago

RuairiSinkler commented 1 year ago

Describe the solution you'd like The ability to perform snapshot/clone operations (specifically creating a Volume Snapshot) on a volume that Trident will not try to delete at any point. E.g., being able to take a snapshot of a Volume that has been imported with the --no-manage flag.

The --no-manage flag isn't explicitly necessary, but the feature we want from that flag is to prevent Trident from deleting the backing storage if the Trident volume were to be deleted. We want to be able to use Trident to take and manage snapshots from a pre-existing volume without running any risk of Trident deleting the original volume from NetApp itself.

Something like --no-delete would be ideal.

Describe alternatives you've considered Restricting the user used by Trident from deleting the Volume in question on the NetApp side. We will be proceeding with this anyway but it a) doesn't seem simple and b) would be better to be protected from both sides i.e. not having any risk of Trident trying to perform the operation in the first place.

Also tried manually altering k8s objects after their creation by Trident, e.g. changing the PV to Retain after its creation, but it was already Retain because of the StorageClass used - seems the deletion logic is controlled internally to Trident, irrelevant of the policy on the PV.

Additional context We are using the ontap-nas-flexgroup backend type.

gnarl commented 1 year ago

Hi @RuairiSinkler,

You can find the following information regarding volume import in the Trident documentation:

When Astra Trident receives the import volume request, the existing volume size is determined and set in the PVC. After the volume is imported by the storage driver, the PV is created with a ClaimRef to the PVC. The reclaim policy is initially set to retain in the PV. After Kubernetes successfully binds the PVC and PV, the reclaim policy is updated to match the reclaim policy of the Storage Class. If the reclaim policy of the Storage Class is delete, the storage volume will be deleted when the PV is deleted.

If you set the reclaim policy to retain then Trident will not delete the storage volume.

wonderland commented 1 year ago

Not exactly what was asked here, but maybe helpful information: The Ontap storage system also has a "volume recovery queue", so even if is it accidentally deleted you can restore it from there. Default retention time is 12h IIRC but can be configured at the SVM/vserver level.

RuairiSinkler commented 1 year ago

@wonderland thanks for the recovery information - unfortunately isn't good enough to solve the problem as deleting it in the first place for any amount of time is a big no-no, but good to have a recovery plan!

@gnarl I believe this is consistent with what I did in my setup, but calling trident delete volume <volume_name> removed the backing storage, not just trident's reference to it. Perhaps I did something wrong though, I will give it another go and report back.

RuairiSinkler commented 1 year ago

Hi @gnarl unfortunately the problem persists, and I think I'm seeing the difference in the docs and what I'm doing. The docs state that "If the reclaim policy of the Storage Class is delete, the storage volume will be deleted when the PV is deleted.", the key part being when the PV is deleted, not the Trident Volume.

tridentctl delete volume <volume_name> ignores the reclaim policy of the PV entirely.

Here are the full recreate steps:

  1. Have a cluster with Trident up and running, connected to an ontap-nas-flexgroup backend (not sure how important the backend type is)
  2. Create a StorageClass with reclaimPolicy: Retain set e.g.
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
    name: retain-storage-class
    provisioner: csi.trident.netapp.io
    reclaimPolicy: Retain
    parameters:
    backendType: "ontap-nas-flexgroup"
    clones: "true"
    provisioningType: "thin"
    snapshots: "true"
  3. Perform a trident import volume operation with the above Storage Class in the PVC tridentctl import volume <backend_name> <netapp_volume> -f <path_to_pvc>. Example pvc:
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: volume-pvc
    spec:
    accessModes:
    - ReadWriteOnce
    storageClassName: retain-storage-class
  4. Wait for the volume to be correctly imported
  5. Perform a volume delete operation tridentctl delete volume <trident_volume_name>
  6. Observe: a. The volume is deleted from tridentctl get volume b. The PVC and PV are not deleted from the cluster c. The backing storage volume is deleted from Netapp because the volume is "managed", ignoring the Reclaim Policy of the Storage Class

c. is the biggest problem for us - we don't want there to be any situation where Trident is trying to delete the backing storage volume, however we can't use --no-manage because we do want to take snapshot copies of the volume.

RuairiSinkler commented 1 year ago

Hi @gnarl - any update on this?

On our end we have just found a workaround in another issue here https://github.com/NetApp/trident/issues/813#issuecomment-1479036221

The second set of instructions allows us to import it, and then manually delete Trident's reference to the volume and cycle the Controller pod to force it to forget about it.

This still isn't ideal so it would be good to hear if the option to import and manage a volume without managing its lifecycle/deletion is being considered