Closed pkalever closed 3 years ago
@pkalever i tried to reproduce it on ceph octopus but am not able to do it
i.imagename:csi-vol-aee99548-1809-11eb-a07a-826f7defc52c csi.volname:pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1])
I1027 04:05:43.332769 1 rbd_journal.go:435] ID: 17 Req-ID: pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1 generated Volume ID (0001-0009-rook-ceph-0000000000000002-aee99548-1809-11eb-a07a-826f7defc52c) and image name (csi-vol-aee99548-1809-11eb-a07a-826f7defc52c) for request name (pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1)
I1027 04:05:43.332861 1 rbd_util.go:200] ID: 17 Req-ID: pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1 rbd: create replicapool/csi-vol-aee99548-1809-11eb-a07a-826f7defc52c size 1024M (features: []) using mon 10.107.158.84:6789
I1027 04:05:43.353568 1 controllerserver.go:465] ID: 17 Req-ID: pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1 created volume pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1 backed by image csi-vol-aee99548-1809-11eb-a07a-826f7defc52c
I1027 04:05:43.375052 1 omap.go:136] ID: 17 Req-ID: pvc-6893d3e8-eee6-4856-bef0-d2bcea21e4c1 set omap keys (pool="replicapool", namespace="", name="csi.volume.aee99548-1809-11eb-a07a-826f7defc52c"): map[csi.imageid:113e70f8d035])
sh-4.4# rbd info csi-vol-aee99548-1809-11eb-a07a-826f7defc52c --pool=replicapool
rbd image 'csi-vol-aee99548-1809-11eb-a07a-826f7defc52c':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 113e70f8d035
block_name_prefix: rbd_data.113e70f8d035
format: 2
features: layering
op_features:
flags:
create_timestamp: Tue Oct 27 04:05:43 2020
access_timestamp: Tue Oct 27 04:05:43 2020
modify_timestamp: Tue Oct 27 04:05:43 2020
sh-4.4# ceph version
ceph version 15.2.5 (2c93eff00150f0cc5f106a559557a58d3d7b6f1f) octopus (stable)
this is with a cephcsi canary image. let me know still you are able to reproduce it. I would like to check a few things.
Is there any update on this?
I discussed seeing flattening happening in #1800 but it was mentioned that none of the operations there would cause flattening.
As it stands right now, if I create a PVC, then snapshot that PVC, then create a clone of the snapshot and try to mount it I'm getting this error from a kubectl describe
of a pod trying to use that cloned PVC:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 73s default-scheduler Successfully assigned rook/busybox-sleep to minikube
Normal SuccessfulAttachVolume 74s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-d9e9bde4-ec56-4f6b-8d0c-2928b66df5d7"
Warning FailedMount 35s (x6 over 58s) kubelet MountVolume.MountDevice failed for volume "pvc-d9e9bde4-ec56-4f6b-8d0c-2928b66df5d7" : rpc error: code = Internal desc = flatten in progress: flatten is in progress for image csi-vol-b3f39645-709a-11eb-8f85-0242ac110010
It is eventually able to enter a running state, but this is due to the flatten operation completing.
I don't want any flattening to occur, as it defeats the point of me using cloning altogether 😞
I did see this in the output of my csi-rbdplugin
logs though:
E0216 20:11:48.159792 6417 util.go:232] kernel 4.19.157 does not support required features
E0216 20:11:48.753455 6417 utils.go:136] ID: 274 Req-ID: 0001-0005-rook-0000000000000002-2e26c0b6-7093-11eb-be63-0242ac110010 GRPC error: rpc error: code = Internal desc = flatten in progress: flatten is in progress for image csi-vol-2e26c0b6-7093-11eb-be63-0242ac110010
Is this flattening happening because I'm running a kernel that doesn't support deep flatten? 🤔
Ah, it seems this comment suggests you must have kernel 5.1+ to avoid a full flatten: https://github.com/ceph/ceph-csi/pull/693#issuecomment-640067191
Presumably this is the problem, as minikube is using 4.19?
@cjheppell kernel less than 5.1+ does not support mapping of rbd images with deep-flatten image feature for that we need to flatten the image first and map it on the node.
Was this a change between v2.1.x and v3?
As described in #1800, when I performed the same actions on v2.1.2 I didn't see this flattening behaviour.
Yes, this is a change in v3.x as we reworked the rbd snapshot and clone implementation.
Presumably that's what the "Snapshot Alpha is no longer supported" in the v3.0.0 release notes is referring to? https://github.com/ceph/ceph-csi/releases/tag/v3.0.0
I must admit, this is very surprising and completely unexpected behaviour as a user.
It seems that unless I'm on a kernel 5.1+ then cloning from snapshots is fundamentally not performing the copy-on-write behaviour that Ceph claims to offer. Even moreso, that's very hidden from me as from glancing at the behaviour in Kubernetes it appears that cloning is working. But it's only when I mount the clone that the flatten is revealed to me.
If that snapshot contains hundreds of gigabytes of data, then that operation is likely to take a very long time.
Even moreso, the only way I was able to determine that I needed a 5.1+ kernel is by digging through issues and pull request comments.
Could this perhaps be documented more clearly somewhere? It would've saved me an awful lot of time from digging through the lines of code and various pull requests associated with this behaviour.
Presumably that's what the "Snapshot Alpha is no longer supported" in the v3.0.0 release notes is referring to? https://github.com/ceph/ceph-csi/releases/tag/v3.0.0
I must admit, this is very surprising and completely unexpected behaviour as a user.
It seems that unless I'm on a kernel 5.1+ then cloning from snapshots is fundamentally not performing the copy-on-write behaviour that Ceph claims to offer. Even moreso, that's very hidden from me as from glancing at the behaviour in Kubernetes it appears that cloning is working. But it's only when I mount the clone that the flatten is revealed to me.
in kubernetes, both snapshot and pvc are the independent objects. this is a new design (v3.x+) to handle that. rbd clone will be created when a user requests kubernetes snapshots.
If that snapshot contains hundreds of gigabytes of data, then that operation is likely to take a very long time.
Even moreso, the only way I was able to determine that I needed a 5.1+ kernel is by digging through issues and pull request comments.
yes as the clones are created with the deep-flatten feature if the kernel version is less than 5.1 the nodeplugin tries to flatten the image and then maps it. you also have an option to flatten the image during the snapshot create operation itself rbdsoftmaxclonedepth
need to be set to 1
for that.
Could this perhaps be documented more clearly somewhere? It would've saved me an awful lot of time from digging through the lines of code and various pull requests associated with this behaviour.
Yes will update the documentation for the minimum required kernel version to support snapshot and clone
in kubernetes, both snapshot and pvc are the independent objects. this is a new design (v3.x+) to handle that. rbd clone will be created when a user requests kubernetes snapshots.
Quite right, but given I'm using a Ceph driver to fulfil the operations of k8s concept of snapshot/clone I'd still expect the behaviour to represent that documented in Ceph's own snapshot/clone semantics. It appears this is true for kernels 5.1+ on v3.x.x, and it was true for kernels <5.1 on releases v2.1.x but is no longer the case for kernels <5.1 on v3.x.x releases.
My point is that as a user, one of the important features Ceph offers is unavailable to me unless some prerequisites are met, and those prerequisites aren't clear.
Perhaps this behaviour could be also be opt-in? I'm aware that kubernetes presents the relationship between snapshot and pvc as independent, but if I consciously acknowledge that that hidden relationship is present then we could avoid the need for flatten for kernels <5.1 on v3.x.x releases?
Yes will update the documentation for the minimum required kernel version to support snapshot and clone
Many thanks. That will be very helpful.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.
Describe the bug
Currently, as part of node service, we add rbd flatten task for new PVC creates. Ideally, we should add a flatten task only for snapshots/cloned PVCs as required.
Environment details
Steps to reproduce
Actual results
Flatten task is added for new PVC
Expected behavior
No Flattern task