backube / volsync

Asynchronous data replication for Kubernetes volumes
https://volsync.readthedocs.io
GNU Affero General Public License v3.0
587 stars 68 forks source link

Defining a ReplicationDestination for the rclone mover causes issue if PVC never backed up/replicated #1122

Open tssgery opened 8 months ago

tssgery commented 8 months ago

Using the rclone mover with an S3 backend, My deployment flow is as follows:

  1. Define ReplicationDestination
  2. Define PVC, with a dataSourceRef pointing to the ReplicationDestination
  3. Define Pod/Workload that uses PVC
  4. Define ReplicationSourcce for PVC

This works well, except for when I add a new PVC definition. When I do that, step "1" above, the rReplicationDestination never exits successfully as there is no permissions.facl file within S3. The code at https://github.com/backube/volsync/blob/483b169f2939781480e79c185c4306fad235b9f1/mover-rclone/active.sh#L43 fails, causing the container to exit with a non-zero return code. K8s sees this, and reschedules the mover again and put it in an endless loop.

Here is an example manifest that exhibits the issue (note that the development/does-not-exist bucket/folder does not exist):

---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationDestination
metadata:
  name: rclone-destination-test
spec:
  trigger:
    manual: "populate-me"
  rclone:
    #destinationPVC: example-pvc
    rcloneConfigSection: "minio"
    rcloneDestPath: "development/does-not-exist"
    rcloneConfig: volsync-rclone-secret
    copyMethod: Snapshot
    accessModes: [ReadWriteOnce]
    capacity: 10Gi
    storageClassName: ceph-block
    volumeSnapshotClassName: ceph-block
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: example-pvc
spec:
  dataSourceRef:
    kind: ReplicationDestination
    apiGroup: volsync.backube
    name: rclone-destination-test
  storageClassName: ceph-block
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  volumes:
    - name: example-pvc
      persistentVolumeClaim:
        claimName: example-pvc
  containers:
    - name: example-container
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: example-pvc

Expected behavior I was hoping that the rclone mover would detect that no files existed, the permissions.facl file not needed, and ignore it's absence

Actual results Described above

tesshuflower commented 8 months ago

I think the problem is here we want there to be an error as no sync to the destination is succeeding (since this repo was never synced to). Other users may think their replications are succeeding when in fact they are doing nothing if we ignore these types of errors.