Closed akalenyu closed 3 months ago
/cc @aglitke @clintonk This becomes an issue with OpenShift Virt where we use the concept of golden images; most VMs are created from a single golden OS flavor image But could also be a problem when mass cloning VM disks from a preset VM.
This is in part related to how FSxN (or Ontap in general) works. By default, the source volume and its clone will have a connection between them, making the clone faster and more space efficient. So while both PVCs are independent objects in K8s, they do not necessarily have to be on the storage side. However, Trident gives you control over this.
I see two main options to deal with this:
Create a "golden snapshot"
With this approach, you not only have a golden image - which still is a read-write volume and therefore could potentially change without you knowing. You also create a "golden snapshot" from that golden image PVC. That freezes the current PVC state, making it immutable so you always clone off the exact same state. In addition, you only have exactly one snapshot and clone off the snapshot rather than the mutable PVC (e.g. Trident won't create an additional snapshot in this case). Potentially you can have more snapshots in the future as you can also use this for "versioning" of the golden image, e.g. make necessary changes to the PVC, then snapshot again. For each clone you can then either use the old snapshot, resembling the previous golden image state, or the new snapshot, resembling the modified state. Therefore my preferred approach.
You create a golden snap like this:
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: golden-snap1
spec:
volumeSnapshotClassName: ontap-snaps
source:
persistentVolumeClaimName: simple-pvc
Then create a clone from that golden snapshot like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: clone
spec:
accessModes:
- ReadWriteMany
storageClassName: sc-nas-svm1
resources:
requests:
storage: 1Gi
dataSource:
kind: VolumeSnapshot
name: golden-snap1
apiGroup: snapshot.storage.k8s.io
Instruct Trident to "split" the clone
Trident will keep the relationship between source and clone intact by default - resulting in the extra snapshot you noted. However, you can change this behavior with either an annotation on the source PVC or by setting the respective option in your Trident backend configuration. When setting splitOnClone to false, Trident will fully decouple source and clone volume, leaving no extra snapshot behind. As an example, setting this annotation on your golden PVC would look like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test3
annotations:
trident.netapp.io/splitOnClone: "true"
spec:
storageClassName: sc-nas-svm1
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
@wonderland Thanks for providing these details. The PR that @akalenyu referenced here changes CDI behavior for trident serviced storage classes to do just as you suggest. We agree that an immutable golden image is desirable for its own merits. Without your trident-specific annotation on the cloned PVC will we still encounter problems if the golden image snapshot is deleted? The golden images are updated periodically by an automated process and we employ garbage collection of older versions in order to better manage our storage usage. In this case there could still be VMs which were cloned from a golden image snapshot that is a garbage collection candidate.
We would like to avoid using vendor-specific annotations on the PVCs we create on behalf of the user.
This issue is fixed in https://github.com/NetApp/trident/commit/cc6b28010363bef13143b0fad722336133bc81eb and will be in Trident 24.10.
This issue is fixed in cc6b280 and will be in Trident 24.10.
Cool, do the recommendations about golden snapshots & cloning via an ephemeral k8s volumesnapshot still stand? Or would you advise we follow up (possibly revert) some of https://github.com/kubevirt/containerized-data-importer/pull/3209?
/cc @aglitke
Cool, do the recommendations about golden snapshots & cloning via an ephemeral k8s volumesnapshot still stand?
Trident only creates an automatic snapshot during cloning because ONTAP needs that. If you create the snapshot, then you can manage its lifecycle as needed. That still seems preferable.
Describe the bug A CSI clone works by creating an ephemeral snapshot behind the scenes. This snapshot is not cleaned up potentially resulting in hitting the snapshot limit for FSx: https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/snapshots-ontap.html (1023 per volume)
Environment Provide accurate information about the environment to help us reproduce the issue.
To Reproduce
Expected behavior Ephemeral snapshot removed
Additional context Snapshot remains: