Open rabin-io opened 11 months ago
The issue seems to be reducible to the following sequence of actions (strictly on k8s entities):
$ oc create -f pvc.yaml
persistentvolumeclaim/simple-pvc created
$ oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
simple-pvc Bound pvc-acefa31e-61f4-4bef-9e82-daf30a4d85c0 1Gi RWX trident-csi-fsx 3s
$ oc create -f snap.yaml
volumesnapshot.snapshot.storage.k8s.io/snapshot created
$ oc get volumesnapshot
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
snapshot true simple-pvc 296Ki csi-snapclass snapcontent-7d00e584-5ce6-40f2-b1f0-40f254845e3d 3s 3s
$ oc delete pvc simple-pvc
persistentvolumeclaim "simple-pvc" deleted
$ oc create -f restore.yaml
persistentvolumeclaim/restore-pvc-1 created
$ oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
restore-pvc-1 Bound pvc-4d92a2ea-02a7-404d-9f9d-054c7dd8361b 1Gi RWX trident-csi-fsx 2s
$ oc delete pvc restore-pvc-1
persistentvolumeclaim "restore-pvc-1" deleted
$ oc delete volumesnapshot snapshot
volumesnapshot.snapshot.storage.k8s.io "snapshot" deleted
# Doesn't converge
Where the manifests are simply
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: simple-pvc
spec:
storageClassName: trident-csi-fsx
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: snapshot
spec:
volumeSnapshotClassName: csi-snapclass
source:
persistentVolumeClaimName: simple-pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: restore-pvc-1
spec:
dataSource:
name: snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
storageClassName: trident-csi-fsx
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
Following this as an issue to investigate.
If anyone is interested in the reproducer in kubevirt terms (I expected the reduced reproducer to be of more interest here):
$ cat dv.yaml
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
name: simple-dv
spec:
source:
registry:
pullMethod: node
url: docker://quay.io/kubevirt/fedora-with-test-tooling-container-disk:v0.53.2
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
$ cat vm.yaml
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: simple-vm
namespace: default
spec:
running: true
template:
metadata:
labels: {kubevirt.io/domain: simple-vm,
kubevirt.io/vm: simple-vm}
spec:
domain:
devices:
disks:
- disk: {bus: virtio}
name: dv-disk
- disk: {bus: virtio}
name: cloudinitdisk
resources:
requests: {memory: 2048M}
volumes:
- dataVolume: {name: simple-dv}
name: dv-disk
- cloudInitNoCloud:
userData: |
#cloud-config
password: fedora
chpasswd: { expire: False }
name: cloudinitdisk
$ cat vmsnap.yaml
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineSnapshot
metadata:
name: snap-larry
spec:
source:
apiGroup: kubevirt.io
kind: VirtualMachine
name: simple-vm
$ cat vmrestore.yaml
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineRestore
metadata:
name: restore-larry
spec:
target:
apiGroup: kubevirt.io
kind: VirtualMachine
name: simple-vm
virtualMachineSnapshotName: snap-larry
Describe the bug When using trident as backend for virtual machines with kubevirt, if one restore a volume of a VM and later on delete the VM, we are left with FlexClone without it parent, which require manual intervention with cli to resolve it
Environment
-d -n trident
To Reproduce
When using this as part of our testing of OCP on AWS with FSx, we see this behavier, and later on this block the deprovision of the FSx storage, as you can't delete the volumes from AWS.
Expected behavior All resources should be deleted when the VM is deleted.
Additional context