Retention policy removes last valid snapshot, leaving no possibility of recovery

Describe the bug VolumeSnapshot has the .status.readyToUse flag which indicates if a snapshot is ready to be used to restore a volume. snapscheduler does not take this flag into account when deciding weather the maxCount retention has been reached. This results in the loss of the last opportunity for recovery.

Steps to reproduce in GKE(in my case v1.28.11) with snapscheduler(v3.4.0) installed:

create PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: snapscheduler-test
  namespace: default
  labels:
    snapscheduler-test: "true"
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: standard-rwo

run some pod with new pvc in order to create the volume: $ kubectl -n default run -it --rm snapscheduler-test --image=gcr.io/distroless/static-debian12 --overrides='{"spec": {"restartPolicy": "Never", "volumes": [{"name": "pvc", "persistentVolumeClaim":{"claimName": "snapscheduler-test"}}]}}' -- sh

create SnapshotSchedule:

apiVersion: snapscheduler.backube/v1
kind: SnapshotSchedule
metadata:
  name: snapscheduler-test
  namespace: default
spec:
  claimSelector:
    matchLabels:
      snapscheduler-test: "true"
  retention:
    maxCount: 3
  schedule: "*/5 * * * *"

wait 5-10 minutes, make sure that volumeshapshots successfully creating:

$ kubectl -n default get volumesnapshot
NAME                                                 READYTOUSE   SOURCEPVC            SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
snapscheduler-test-snapscheduler-test-202408301525   true         snapscheduler-test                           1Gi           p2p-csi         snapcontent-4f748e4d-80d8-4353-8819-a6efb2836821   87s            2m6s

remove compute disk in GCP (via WebUI or gcloud command) -- human error had happened :

$ pv=$(kubectl -n default get pvc snapscheduler-test -ojsonpath='{.spec.volumeName}')
$ zone=$(gcloud --project=$GCP_PROJECT compute disks list --filter="name=($pv)"|grep pvc|awk '{print $2}')
$ gcloud --project p2p-data-warehouse compute disks delete $pv --zone $zone

after 10 minutes there are two volumesnapshots with readytouse=false:

$ kubectl -n default get volumesnapshot
NAME                                                 READYTOUSE   SOURCEPVC            SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
snapscheduler-test-snapscheduler-test-202408301525   true         snapscheduler-test                           1Gi           p2p-csi         snapcontent-4f748e4d-80d8-4353-8819-a6efb2836821   10m            11m
snapscheduler-test-snapscheduler-test-202408301530   false        snapscheduler-test                                         p2p-csi         snapcontent-cec59c70-c186-44fd-99f8-9226192d7a6a                  6m38s
snapscheduler-test-snapscheduler-test-202408301535   false        snapscheduler-test                                         p2p-csi         snapcontent-d81644f4-eb28-4da9-94b5-d57f1972aeb3                  98s

after 15 minutes we don't have any valid snapshot anymore (maxCount: 3 retention policy)

$ kubectl -n default get volumesnapshot
NAME                                                 READYTOUSE   SOURCEPVC            SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
snapscheduler-test-snapscheduler-test-202408301530   false        snapscheduler-test                                         p2p-csi         snapcontent-cec59c70-c186-44fd-99f8-9226192d7a6a                  13m
snapscheduler-test-snapscheduler-test-202408301535   false        snapscheduler-test                                         p2p-csi         snapcontent-d81644f4-eb28-4da9-94b5-d57f1972aeb3                  8m6s
snapscheduler-test-snapscheduler-test-202408301540   false        snapscheduler-test                                         p2p-csi         snapcontent-b6113f79-3219-435d-8321-812ddc096154                  3m6s

Expected behavior ❗ retention policy must not take into account VolumeSnapshots with .status.readyToUse==false. ❔ if possible, create a new snapshot only after the previous one has entered the ready state

Actual results retention policy removes last valid snapshot, leaving no possibility of recovery

Additional context

backube / snapscheduler

Retention policy removes last valid snapshot, leaving no possibility of recovery #688