Open phoerious opened 1 year ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
No, thank you!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
Jeez....
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
:disappointed:
connecting failed: rados: ret=-13, Permission denied
This mostly happens due to the permission issue , can you please check and update ceph user caps as per https://github.com/ceph/ceph-csi/blob/devel/docs/capabilities.md
@phoerious we really dont have solid E2E for the migration, if you have logs we can try to debug and see what is happening
These are the permissions of both the new CSI user and the old legacy user:
caps mgr = "allow rw"
caps mon = "profile rbd"
caps osd = "profile rbd pool=rbd.k8s-pvs, profile rbd pool=rbd.k8s-pvs-ssd"
I create a PVC with the old storage class name, which gets rerouted to the new CSI driver. When I try to delete that PVC, the associated PV gets stuck "Terminating" with this:
Warning VolumeFailedDelete 4s (x6 over 14s) rbd.csi.ceph.com_ceph-csi-rbd-provisioner-789d77444b-fjrmg_673c5c4f-7ce8-424f-836e-22e2d06cc1ad rpc error: code = Internal desc = failed to get connection: connecting failed: rados: ret=-13, Permission denied
The provisioner log is littered with this:
I0209 10:38:22.378293 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-184ebeb5-0695-4c80-b9d2-0a479a5f00d6", UID:"6b536ef6-9ceb-4879-a2e2-c10c3f9fe20a", APIVersion:"v1", ResourceVersion:"3698428630", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' rpc error: code = Internal desc = failed to get connection: connecting failed: rados: ret=-13, Permission denied
I0209 10:39:26.379253 1 controller.go:1502] delete "pvc-184ebeb5-0695-4c80-b9d2-0a479a5f00d6": started
E0209 10:39:26.407627 1 controller.go:1512] delete "pvc-184ebeb5-0695-4c80-b9d2-0a479a5f00d6": volume deletion failed: rpc error: code = Internal desc = failed to get connection: connecting failed: rados: ret=-13, Permission denied
W0209 10:39:26.407731 1 controller.go:989] Retrying syncing volume "pvc-184ebeb5-0695-4c80-b9d2-0a479a5f00d6", failure 8
E0209 10:39:26.407806 1 controller.go:1007] error syncing volume "pvc-184ebeb5-0695-4c80-b9d2-0a479a5f00d6": rpc error: code = Internal desc = failed to get connection: connecting failed: rados: ret=-13, Permission denied
I0209 10:39:26.407882 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-184ebeb5-0695-4c80-b9d2-0a479a5f00d6", UID:"6b536ef6-9ceb-4879-a2e2-c10c3f9fe20a", APIVersion:"v1", ResourceVersion:"3698428630", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' rpc error: code = Internal desc = failed to get connection: connecting failed: rados: ret=-13, Permission denied
The associated RBD in the pool has long been deleted.
rbd -p rbd.k8s-pvs ls | grep kubernetes-dynamic-pvc-e7c7501f-c1c4-42bb-bef1-32b57d418def
That's all I have.
caps mgr = "allow rw"
caps mon = "profile rbd"
caps osd = "profile rbd pool=rbd.k8s-pvs, profile rbd pool=rbd.k8s-pvs-ssd"
can you please remove extra profile from the osd caps and see if that is the one causing the issue, can you make it as below
caps mgr = "allow rw"
caps mon = "profile rbd"
caps osd = "profile rbd pool=rbd.k8s-pvs"
Same thing.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
Nope, still there.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
:trumpet:
Describe the bug
I recently migrated from the in-tree Ceph storage driver to the CSI driver and wanted to enable the migration plugin for existing kubernetes.io/rbd volumes.
I used these two documents for reference:
I noticed that both are relatively incomplete and grammatically highly confusing. I think I did everything that was required for the migration, but I don't really know whether the legacy plugin is really redirected to the CSI driver or not. I believe it is, since I tried what was written in the first document above:
and I got errors in the provisioner log about it not finding the correct cluster ID. I do not get an error when I generate the hash without a trailing
\n
usingecho -n "<monaddress[es]:port>" | md5sum
instead (I think this is a bug in the docs!).My main issue, however, is that when I create a new RBD using the legacy storage class, an RBD gets provisioned and cleaned up, but the PV spec gets stuck in a
Terminating
state with the following error:The provisioner logs this
The existence of this error seems to indicate that the CSI plugin does indeed handle the kubernetes.io/rbd requests, although with an error.
I did verify with
rbd ls rbd.k8s-pvs | grep VOLUME_NAME
that the RBD volume gets created and deleted correctly, so this is a bogus "Permission denied" error. It is annoying nonetheless, since the only way to get rid of the PV is to edit the spec and remove thefinalizer
.Environment details
Steps to reproduce
Steps to reproduce the behavior:
Actual results
RBD volume gets created and deleted, PVC is deleted as well, but PV gets stuck in
Terminating
state with a bogus Permission denied error.