Open dhess opened 9 months ago
@dhess This is an interesting one. I'm not sure the workaround used for Direct
mode will work, as it relies on finding another active pod that's currently using the PVC and then scheduling on the same node.
In this case (if I understand correctly), a new PVC from snapshot is created, and the VolSync mover pod should then be the 1st consumer of this PVC. Normally I would have thought the pod should get scheduled automatically in the correct place, but maybe something else is going on.
Does ZFS-LocalPV use the csi topology feature? https://kubernetes-csi.github.io/docs/topology.html
One more question: When you create your original sourcePVC and then run your application pod, do you also need to manually configure that pod to run on a particular node that corresponds to where the PVC was provisioned?
Hi @tesshuflower, thanks for the quick response.
Does ZFS-LocalPV use the csi topology feature? https://kubernetes-csi.github.io/docs/topology.html
I'm not familiar with CSI Topology, but from what I can tell, it seems it does:
I'm guessing this manifest for the openebs-zfs-localpv-controller
also demonstrates that it's using CSI topology:
- args:
- --csi-address=$(ADDRESS)
- --v=5
- --feature-gates=Topology=true
- --strict-topology
- --leader-election
- --enable-capacity=true
- --extra-create-metadata=true
- --default-fstype=ext4
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: registry.k8s.io/sig-storage/csi-provisioner:v3.5.0
imagePullPolicy: IfNotPresent
name: csi-provisioner
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
Are there any particular topology keys I should use for compatibility with VolSync? Is the ZFS-LocalPV Helm chart's default "All"
value a valid key?
One more question: When you create your original sourcePVC and then run your application pod, do you also need to manually configure that pod to run on a particular node that corresponds to where the PVC was provisioned?
I think you're referring to statically provisioned PVCs here? If so, I'm not using those, so I'm not sure. All of the PVCs I'm trying to use as source PVCs for VolSync are dynamically provisioned as part of a StatefulSet
or similar, and therefore Kubernetes creates the PVC on the same node where its pod will run.
@dhess there's nothing specific in VolSync that you should need to do to ensure compatibility. I guess normally I'd expect that the first consumer (the volsync mover pod in this case) of a PVC should get automatically scheduled on a node where that pvc is accessible. It sounds like this is happening with your statefulset for example.
Maybe you could try something to help me understand - If you create a volumesnapshot for one of your source PVCs and then create a PVC from this snapshot (or do a clone instead of volumesnapshot+pvc if you're using copymethod Clone
) - Can you then create a job or deployment that mounts this PVC without specifically needing to set affinity to schedule it on a particular node?
Ahh, I see what you mean now. I'll try an experiment and get back to you.
👋 this issue also happens with CSI democrati-csi local-hostpath using the volsync volumepopulator.
https://github.com/democratic-csi/democratic-csi/issues/329
seems to be a time based racecondition.
👋 this issue also happens with CSI democrati-csi local-hostpath using the volsync volumepopulator.
democratic-csi/democratic-csi#329
seems to be a time based racecondition.
@danielsand I don't think this issue was specifically about the volumepopulator - would you be able to explain the scenario where you're hitting the issue?
So since I originally posted this issue, VolSync snapshots with ZFS-LocalPV have been working pretty reliably. However, we just ran into the issue (or at least a similar one) again, and I think it's possible that I misdiagnosed the original problem.
This time what happened is:
Clone
PVC was correctly created on the same node as the source ZFS-LocalPV PVC, the cache PVC was not — it was being created on one of the new worker nodes. Since ZFS-LocalPV volumes can't be mounted across the network, the ReplicationSource
job was getting stuck on the remote ZFS-LocalPV cache PVC.The ReplicationSource
job originally looked like this:
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: db-primer-service-0
spec:
sourcePVC: db-primer-service-0
trigger:
# 1 backup per hour
schedule: "30 * * * *"
restic:
cacheStorageClassName: zfspv-pool-0
copyMethod: Clone
pruneIntervalDays: 7
repository: restic-config-db-primer-service-0
retain:
hourly: 24
daily: 7
weekly: 1
volumeSnapshotClassName: zfspv-snapclass
where zfspv-pool-0
is the same ZFS-LocalPV storage class as the source volume.
In the last few months we've also added support for Mayastor to our cluster, and those PVCs are not tied to a particular node, so when I changed the cache storage class to Mayastor, the backup job ran and completed successfully:
---
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: db-primer-service-0
spec:
sourcePVC: db-primer-service-0
trigger:
# 1 backup per hour
schedule: "30 * * * *"
restic:
cacheStorageClassName: mayastor-pool-0-repl-1
copyMethod: Clone
pruneIntervalDays: 7
repository: restic-config-db-primer-service-0
retain:
hourly: 24
daily: 7
weekly: 1
volumeSnapshotClassName: zfspv-snapclass
So I think that the problem here isn't with the source volume, but with the cache volume. I suspect that in order to reliably use a local PV storage class for cache volumes, there'll need to be some way to specify the topology of that volume.
What's still puzzling is that all of our other cacheStorageClassName
s also specify a ZFS-LocalPV storage class, and this is the first time I've seen a stuck job in awhile. Why this suddenly popped up again after adding some new nodes is curious. Maybe the scheduler is trying to balance out the number of PVCs across the new nodes?
@dhess is your storageclass using a VolumeBindingMode of WaitForFirstConsumer
? VolSync doesn't create the cache PVC until just before creating the job, so normally I think it should be figured out in the scheduling - unless you're using a VolumeBindingMode of Immediate
, in which case the PVC could be bound to a node that isn't the same one as your pvc from snap.
👋 this issue also happens with CSI democrati-csi local-hostpath using the volsync volumepopulator. democratic-csi/democratic-csi#329 seems to be a time based racecondition.
@danielsand I don't think this issue was specifically about the volumepopulator - would you be able to explain the scenario where you're hitting the issue?
The linked issue wasnt about the volumepopulator, democrati csi local-hostpath + volume snapshots + volsync didnt worked for some folks.
Just a reference it on what was is currently running on my end and what is working. (CSI and volume snapshots work as they should)
Volumepopulator is failing at random currently on my setup. The wrong node gets picked by the volume populator and WaitForFirstConsumer is specified.
Will circle back when I push the topic again.
👋 this issue also happens with CSI democrati-csi local-hostpath using the volsync volumepopulator. democratic-csi/democratic-csi#329 seems to be a time based racecondition.
@danielsand I don't think this issue was specifically about the volumepopulator - would you be able to explain the scenario where you're hitting the issue?
The linked issue wasnt about the volumepopulator, democrati csi local-hostpath + volume snapshots + volsync didnt worked for some folks.
Just a reference it on what was is currently running on my end and what is working. (CSI and volume snapshots work as they should)
Volumepopulator is failing at random currently on my setup. The wrong node gets picked by the volume populator and WaitForFirstConsumer is specified.
Will circle back when I push the topic again.
@danielsand I've created a separate issue https://github.com/backube/volsync/issues/1255 to track this. I believe both issues are about storage drivers that create volumesnapshots/pvcs that are constrained to specific nodes, but I think your issue is related to using the volumepopulator, and this one is not.
Hi, thanks for this great project! We just started using it with our Rook/Ceph volumes, and it's working great.
It doesn't work so well with OpenEBS ZFS LocalPV (ZFS-LocalPV) volumes, however. ZFS-LocalPV has first-class support for CSI snapshotting and cloning, but VolSync can't figure out that the ZFS-LocalPV snapshot of a PVC mounted on, e.g.,
node-a
, can also only be consumed fromnode-a
.copyMethod: Direct
doesn't help here for in-use volumes, because they can't be remounted. (Actually, I seem to recall that ZFS-LocalPV does support simultaneous pod mounts with a bit of extra configuration, but I'd prefer to use snapshots for proper PiT backups, anyway.)Would it be difficult to add first-class support to VolSync for node-local provisioners with snapshotting support, like ZFS-LocalPV? Unless I'm missing something, it seems like it should be possible: since
copyMethod: Direct
can determine which node a PVC is mounted on and ensure the sync is performed from that node, then naïvely, it seems that an additional configuration option could be added to tell VolSync to mount a snapshot and run the sync operation on the same node where the source PVC is mounted.