Closed DreamingRaven closed 1 week ago
@DreamingRaven, I think it sounds to me like you're doing the correct thing with WaitForFirstConsumer
.
I'm a bit surprised that this doesn't work, as both PVCs should be used for the first time by the pod created by the mover job.
It looks like you're using a copyMethod of Clone
and have not specified a cacheStorageClassName
so both the clone of your original source PVC and the cache volume should be using the default storageclass, which looks to be standard-rwo
.
One thing I'm not sure of is how the clone works in your env, does it get provisioned immediately to the same availability zone as the source volume? According to the google docs link you pointed to, there is this:
However, Pods or Deployments don't inherently recognize the zone of pre-existing persistent disks
Which makes me wonder if the clone PVC is the issue here - it may get pre-provisioned even with WaitForFirstConsumer - this is just me guessing however. Is this possible to test by creating a clone PVC yourself?
If this is the case you could alternatively try to use copyMethod of Snapshot
to see if it makes a difference. In Snapshot mode, a volumesnapshot of your source PVC is first taken, then a pvc provisioned from it which should then use WaitForFirstConsumer
.
@tesshuflower I have created 3 clone PVCs. Without a pod to mount them they do indeed wait for first consumer using the default storage class:
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
components.gke.io/component-name: pdcsi
components.gke.io/component-version: ***
components.gke.io/layer: addon
storageclass.kubernetes.io/is-default-class: "true"
labels:
addonmanager.kubernetes.io/mode: EnsureExists
k8s-app: gcp-compute-persistent-disk-csi-driver
name: standard-rwo
parameters:
type: pd-balanced
provisioner: pd.csi.storage.gke.io
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-ghost-mysql-0 Bound pvc-e05137fb-2eec-4a05-b2e7-d69dd2219600 50Gi RWO standard-rwo 70d
ghost Bound pvc-175e420f-71f8-4067-92d1-df3cf2f11701 50Gi RWO standard-rwo 70d
ghost-clone-a Pending standard-rwo 75s
ghost-clone-b Pending standard-rwo 75s
ghost-clone-c Pending standard-rwo 75s
I then bound each volume individually to different pods to test which availability zone they end up in. I expected them to end up in the same zone as the source volume:
which is indeed the case for all volumes:
csi:
driver: pd.csi.storage.gke.io
fsType: ext4
volumeAttributes:
storage.kubernetes.io/csiProvisionerIdentity: ***-pd.csi.storage.gke.io
volumeHandle: projects/***/zones/europe-west2-c/disks/pvc-***
So then the question I find myself asking is why does the cache volume not also end up being assigned to the same zone, if both volumes are being provisioned for the same pod. I then tested by deleting the replicationsource and reinstating it to see the order in which they are provisioned. It looks to me that the cache volume is provisioned almost instantly, I suspect since it is being provisioned first there is no consideration taking place for the in-progress backup volume, which takes significantly longer to clone.
I will shortly try with snapshots, which I hope will inform the volume placement earlier in the chain!
I changed the .spec.restic.CopMethod type of the replication source to Snapshot (as @tesshuflower recommended), which provisioned the snapshot first before other resources. (I also added the annotation snapshot.storage.kubernetes.io/is-default-class: "true"
to the GKE default volumesnapshotclass as per the docs) This lead to a successful initial pod:
However, at this stage I was concerned that the next backup, since the cache volume now already exists would fail, so I reduced the cron to activate every 10 minutes to confirm. I found that the next tick did also complete successfully. Although I am yet to confirm the backup with a restore. Which is the next operation I want to check.
Interestingly however, the cache volume is still in europe-west2-a, I checked the provisioned volumes after the volume snapshot, and they too end up in europe-west2-a the same as the cache volume. So it appears that the data is actually moving zones since it originated from europe-west2-c in the ghost volume through the snapshot creating X-backup-src pv like so:
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: pd.csi.storage.gke.io
volume.kubernetes.io/provisioner-deletion-secret-name: ""
volume.kubernetes.io/provisioner-deletion-secret-namespace: ""
finalizers:
- kubernetes.io/pv-protection
- external-attacher/pd-csi-storage-gke-io
name: pvc-***
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 50Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: volsync-ghost-backup-src
namespace: ghost
csi:
driver: pd.csi.storage.gke.io
fsType: ext4
volumeAttributes:
storage.kubernetes.io/csiProvisionerIdentity: ***-pd.csi.storage.gke.io
volumeHandle: projects/***/zones/europe-west2-a/disks/pvc-*** # <--- MOVED from europe-west2-c
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.gke.io/zone
operator: In
values:
- europe-west2-a # <--- MOVED from europe-west2-c
persistentVolumeReclaimPolicy: Delete
storageClassName: standard-rwo
volumeMode: Filesystem
status:
phase: Bound
So this appears to work. I will restore from this data to confirm, since the volume is also moving zones I want to confirm the data inside the volume has too, since this zone migration is surprising behaviour.
Ok I can confirm the backups work from the zone-migrated volumes. Although cloned volumes do not work for the aforementioned issue. Seeing as how this issue was geared towards solving the AZ issue, rather than specifying cloned volumes I would say this is resolved.
As an aside, I note that restored-to-volumes, do not get wiped on restore. This is my current restore as per the docs:
apiVersion: volsync.backube/v1alpha1
kind: ReplicationDestination
metadata:
name: ghost-restore
spec:
trigger:
manual: restore-once
restic:
repository: ghost-backup
destinationPVC: ghost
copyMethod: Snapshot
Is there any option in volSync to do this, or are there any established setups / patterns for doing so? @tesshuflower thanks for your help, it is much appreciated.
@DreamingRaven thanks for the detailed information, this was an interesting one. Glad to hear that snapshots do seem to work for your use-case.
Right now you'll get a new empty PVC if you provision a new one yourself rather than re-using, or use something like the volume populator to get a new PVC.
There's a long discussion here about using the volume populator, in case the use-case mentioned is in any way similar to yours: https://github.com/backube/volsync/issues/627#issuecomment-1663933508
If your use-case is really about trying to synchronize data to a PVC on a remote cluster (i.e. a sync operation that you will run repeatedly at the destination), you could potentially look at using the rclone or rsync-tls movers.
OK, I will have a look. I am creating a staging environment that I want to allow some drift, then after a period of time it should be wiped and set to the same state as production. Thanks for your time @tesshuflower, it sounds like the volume populator is exactly what I need with a separate cron deletion! Then ArgoCD will recreate the resource and re-pull the backup, returning the staging environment to a production-like state.
Describe the bug
There is a misalignment of volumes being provisioned in multi-AZ clusters. This causes volsync job-pods to be unscheduleable.
On my non multi-AZ cluster, volsync pods are scheduled without incident, since both volumes will not have any AZ restriction for mounting. However, on my multi-AZ GKE cluster the two volumes for X-backup-cache, and X-backup-src cause X-backup job to stall, since the pod cannot be scheduled with:![image](https://github.com/backube/volsync/assets/10534713/b83f74be-be1d-4cfd-b0e4-b592d973489d)
9 node(s) had volume node affinity conflict.
, as the volumes are in different zones, so no node will satisfy the pods requirements.Steps to reproduce
Create a multi-AZ cluster in GKE. Create any ReplicationSource resource, e.g:
This will likely then create owned resources like so:
The pod will likely be unable to mount both volumes, and as such is unscheduled permenantly.
Expected behavior
I would expect both provisioned PVs to be allocated to the same availability zone as the PV being backed up.
Actual results
Since GKE assigns zones randomly unless specified https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes#pd-zones you will end up with something like this:![image](https://github.com/backube/volsync/assets/10534713/cc550975-f46e-433a-b559-021ac9fa71f6)
When inspected, the two pvs created by volsync will look something like this:
Additional context
I can foresee a few ways to solve this issue:
So it is currently unclear to me how one would force the zones to match, unless you completely remove the multiple volumes.