ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 537 forks source link

rbd-nbd: occasionally file system entering into read-only on nodeplugin restart #2204

Open Madhu-1 opened 3 years ago

Madhu-1 commented 3 years ago

Requires below PR from ceph:

STEP: restart rbd-nbd process on nodeplugin and continue IO after nodeplugin restart Jun 23 15:02:27.261: INFO: Running '/usr/bin/kubectl --server=https://192.168.39.63:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-aec55366 --namespace=cephcsi-e2e-aec55366 delete -f -' Jun 23 15:02:27.387: INFO: stderr: "warning: deleting cluster-scoped resources, not scoped to the provided namespace\n" Jun 23 15:02:27.387: INFO: stdout: "storageclass.storage.k8s.io \"csi-rbd-sc\" deleted\n" Jun 23 15:02:27.394: INFO: ExecWithOptions {Command:[/bin/sh -c ceph fsid] Namespace:rook-ceph PodName:rook-ceph-tools-7467d8bf8-vjzwh ContainerName:rook-ceph-tools Stdin: CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false} Jun 23 15:02:27.394: INFO: >>> kubeConfig: /root/.kube/config Jun 23 15:02:29.405: INFO: Waiting up to &PersistentVolumeClaim{ObjectMeta:{rbd-pvc rbd-3274 0 0001-01-01 00:00:00 +0000 UTC map[] map[] [] [] []},Spec:PersistentVolumeClaimSpec{AccessModes:[ReadWriteOnce],Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{storage: {{1073741824 0} {} 1Gi BinarySI},},},VolumeName:,Selector:nil,StorageClassName:*csi-rbd-sc,VolumeMode:nil,DataSource:nil,},Status:PersistentVolumeClaimStatus{Phase:,AccessModes:[],Capacity:ResourceList{},Conditions:[]PersistentVolumeClaimCondition{},},} to be in Bound state Jun 23 15:02:29.406: INFO: waiting for PVC rbd-pvc (0 seconds elapsed) Jun 23 15:02:31.409: INFO: waiting for PVC rbd-pvc (2 seconds elapsed) Jun 23 15:02:31.415: INFO: Waiting for PV pvc-42f62cbf-9e2f-4f7a-99e1-98768ebfb561 to bind to PVC rbd-pvc Jun 23 15:02:31.415: INFO: Waiting up to 3m0s for PersistentVolumeClaims [rbd-pvc] to have phase Bound Jun 23 15:02:31.418: INFO: PersistentVolumeClaim rbd-pvc found and phase=Bound (2.553576ms) Jun 23 15:02:31.418: INFO: Waiting up to 3m0s for PersistentVolume pvc-42f62cbf-9e2f-4f7a-99e1-98768ebfb561 to have phase Bound Jun 23 15:02:31.421: INFO: PersistentVolume pvc-42f62cbf-9e2f-4f7a-99e1-98768ebfb561 found and phase=Bound (2.800294ms) Jun 23 15:02:31.434: INFO: Waiting up to csi-rbd-demo-pod to be in Running state Jun 23 15:02:31.451: INFO: csi-rbd-demo-pod app is in Pending phase expected to be in Running state (0 seconds elapsed) Jun 23 15:02:33.460: INFO: csi-rbd-demo-pod app is in Pending phase expected to be in Running state (2 seconds elapsed) Jun 23 15:02:35.455: INFO: csi-rbd-demo-pod app is in Pending phase expected to be in Running state (4 seconds elapsed) Jun 23 15:02:37.456: INFO: csi-rbd-demo-pod app is in Pending phase expected to be in Running state (6 seconds elapsed) Jun 23 15:02:39.455: INFO: csi-rbd-demo-pod app is in Pending phase expected to be in Running state (8 seconds elapsed) Jun 23 15:02:41.456: INFO: csi-rbd-demo-pod app is in Pending phase expected to be in Running state (10 seconds elapsed) Jun 23 15:02:43.462: INFO: ExecWithOptions {Command:[/bin/sh -c rbd ls --format=json --pool=replicapool] Namespace:rook-ceph PodName:rook-ceph-tools-7467d8bf8-vjzwh ContainerName:rook-ceph-tools Stdin: CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false} Jun 23 15:02:43.462: INFO: >>> kubeConfig: /root/.kube/config Jun 23 15:02:44.297: INFO: LabelSelector for csi-rbdplugin daemonsets in namespace cephcsi-e2e-aec55366: app=csi-rbdplugin Jun 23 15:02:44.301: INFO: ExecWithOptions {Command:[/bin/sh -c uname -a] Namespace:cephcsi-e2e-aec55366 PodName:csi-rbdplugin-bvszc ContainerName:csi-rbdplugin Stdin: CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false} Jun 23 15:02:44.301: INFO: >>> kubeConfig: /root/.kube/config Jun 23 15:02:44.497: INFO: uname -a: Linux minikube 4.19.182 #1 SMP Wed Jun 9 00:54:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Jun 23 15:02:44.504: INFO: ExecWithOptions {Command:[/bin/sh -c rpm -qa | grep rbd-nbd] Namespace:cephcsi-e2e-aec55366 PodName:csi-rbdplugin-bvszc ContainerName:csi-rbdplugin Stdin: CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false} Jun 23 15:02:44.504: INFO: >>> kubeConfig: /root/.kube/config Jun 23 15:02:45.245: INFO: rbd-nbd package version: rbd-nbd-16.2.4-0.el8.x86_64

Jun 23 15:02:45.250: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-aec55366 PodName:csi-rbdplugin-bvszc ContainerName:csi-rbdplugin Stdin: CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false} Jun 23 15:02:45.250: INFO: >>> kubeConfig: /root/.kube/config Jun 23 15:02:45.485: INFO: map command running before restart, mapCmd: /usr/bin/rbd-nbd --id cephcsi-rbd-node -m rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789 --keyfile=/tmp/csi/keys/keyfile-113061691 map replicapool/csi-vol-a5bccd9c-d42b-11eb-991e-4664e1c21ce8 --try-netlink --reattach-timeout=180

Jun 23 15:02:45.495: INFO: ExecWithOptions {Command:[/bin/sh -c ceph auth get-key client.cephcsi-rbd-node] Namespace:rook-ceph PodName:rook-ceph-tools-7467d8bf8-vjzwh ContainerName:rook-ceph-tools Stdin: CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false} Jun 23 15:02:45.495: INFO: >>> kubeConfig: /root/.kube/config Jun 23 15:02:47.629: INFO: Running '/usr/bin/kubectl --server=https://192.168.39.63:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-aec55366 delete po -l app=csi-rbdplugin --ignore-not-found=false' Jun 23 15:02:57.864: INFO: stderr: "" Jun 23 15:02:57.864: INFO: stdout: "pod \"csi-rbdplugin-bvszc\" deleted\n" Jun 23 15:02:57.864: INFO: Waiting up to 10m0s for all daemonsets in namespace 'cephcsi-e2e-aec55366' to start Jun 23 15:02:57.884: INFO: 0 / 1 pods ready in namespace 'cephcsi-e2e-aec55366' in daemonset 'csi-rbdplugin' (0 seconds elapsed) Jun 23 15:02:59.888: INFO: 0 / 1 pods ready in namespace 'cephcsi-e2e-aec55366' in daemonset 'csi-rbdplugin' (2 seconds elapsed) Jun 23 15:03:01.889: INFO: 1 / 1 pods ready in namespace 'cephcsi-e2e-aec55366' in daemonset 'csi-rbdplugin' (4 seconds elapsed) Jun 23 15:03:01.889: INFO: attach command to run after restart, attachCmd: /usr/bin/rbd-nbd --id cephcsi-rbd-node -m rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789 --keyfile=/tmp/csi/keys/keyfile-test attach --device /dev/nbd0 replicapool/csi-vol-a5bccd9c-d42b-11eb-991e-4664e1c21ce8 --try-netlink --reattach-timeout=180

Jun 23 15:03:01.893: INFO: ExecWithOptions {Command:[/bin/sh -c echo AQD+PdNgc4hbCRAAIquzYZxtb8XiMs9fGYO/BA== > /tmp/csi/keys/keyfile-test] Namespace:cephcsi-e2e-aec55366 PodName:csi-rbdplugin-8kqnl ContainerName:csi-rbdplugin Stdin: CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false} Jun 23 15:03:01.893: INFO: >>> kubeConfig: /root/.kube/config Jun 23 15:03:02.072: INFO: ExecWithOptions {Command:[/bin/sh -c /usr/bin/rbd-nbd --id cephcsi-rbd-node -m rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789 --keyfile=/tmp/csi/keys/keyfile-test attach --device /dev/nbd0 replicapool/csi-vol-a5bccd9c-d42b-11eb-991e-4664e1c21ce8 --try-netlink --reattach-timeout=180 ] Namespace:cephcsi-e2e-aec55366 PodName:csi-rbdplugin-8kqnl ContainerName:csi-rbdplugin Stdin: CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false} Jun 23 15:03:02.072: INFO: >>> kubeConfig: /root/.kube/config Jun 23 15:03:02.426: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-aec55366 PodName:csi-rbdplugin-8kqnl ContainerName:csi-rbdplugin Stdin: CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false} Jun 23 15:03:02.426: INFO: >>> kubeConfig: /root/.kube/config Jun 23 15:03:02.638: INFO: attach command running after restart, runningAttachCmd: /usr/bin/rbd-nbd --id cephcsi-rbd-node -m rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789 --keyfile=/tmp/csi/keys/keyfile-test attach --device /dev/nbd0 replicapool/csi-vol-a5bccd9c-d42b-11eb-991e-4664e1c21ce8 --try-netlink --reattach-timeout=180

Jun 23 15:03:02.643: INFO: ExecWithOptions {Command:[/bin/sh -c echo 'Hello World' > /var/lib/www/html/test] Namespace:rbd-3274 PodName:csi-rbd-demo-pod ContainerName:web-server Stdin: CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false} Jun 23 15:03:02.643: INFO: >>> kubeConfig: /root/.kube/config Jun 23 15:03:18.124: INFO: stdErr occurred: /bin/sh: 1: cannot create /var/lib/www/html/test: Read-only file system

Jun 23 15:03:18.125: FAIL: failed to write IO, err: command terminated with exit code 2, stdErr: /bin/sh: 1: cannot create /var/lib/www/html/test: Read-only file system


failed at https://github.com/ceph/ceph-csi/pull/2095
Logs at https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/rest/organizations/jenkins/pipelines/mini-e2e_k8s-1.20/runs/1270/nodes/87/steps/90/log/?start=0
Madhu-1 commented 3 years ago

@pkalever PTAL

pkalever commented 3 years ago

Ack! Thanks for reporting Madhu, I have also encountered this before, but looks like it's not hit very frequently.

I will investigate on the root cause and get back.

PS: once the volume-healer patches are merged, probably we don't see this!

pkalever commented 3 years ago

Sent a PR to rbd-nbd: https://github.com/ceph/ceph/pull/42609

nixpanic commented 3 years ago

problem was hit in https://github.com/ceph/ceph-csi/pull/2341#issuecomment-892565234 too

Madhu-1 commented 3 years ago

problem hit again https://github.com/ceph/ceph-csi/pull/2322#issuecomment-893155681

pkalever commented 3 years ago

problem hit again #2322 (comment)

[ Copying Comment from the other thread, for context:]

/test ci/centos/mini-e2e/k8s-1.21

`Aug 4 20:23:22.683: INFO: >>> kubeConfig: /root/.kube/config Aug 4 20:23:23.708: INFO: stdErr occurred: /bin/sh: 1: cannot create /var/lib/www/html/test: Read-only file system

Aug 4 20:23:23.708: FAIL: failed to write IO, err: command terminated with exit code 2, stdErr: /bin/sh: 1: cannot create /var/lib/www/html/test: Read-only file system ` https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/rest/organizations/jenkins/pipelines/mini-e2e_k8s-1.21/runs/624/nodes/87/steps/90/log/?start=0

I see we are hitting Read-only even after sync is run successfully:

Aug  4 20:22:52.334: INFO: ExecWithOptions {Command:[/bin/sh -c sync /var/lib/www/html] Namespace:rbd-694 PodName:csi-rbd-demo-pod ContainerName:web-server Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug  4 20:22:52.515: INFO: ExecWithOptions {Command:[/bin/sh -c rbd ls --format=json --pool=replicapool] Namespace:rook-ceph PodName:rook-ceph-tools-7467d8bf8-9dm44 ContainerName:rook-ceph-tools Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug  4 20:22:53.344: INFO: LabelSelector for csi-rbdplugin daemonsets in namespace cephcsi-e2e-b3b077f5: app=csi-rbdplugin
Aug  4 20:22:53.344: INFO: Running '/usr/bin/kubectl --server=https://192.168.39.165:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-b3b077f5 delete po -l app=csi-rbdplugin --ignore-not-found=false'
Aug  4 20:22:55.265: INFO: stderr: ""
Aug  4 20:22:55.265: INFO: stdout: "pod \"csi-rbdplugin-bwwjj\" deleted\n"
Aug  4 20:22:55.265: INFO: Waiting up to 10m0s for all daemonsets in namespace 'cephcsi-e2e-b3b077f5' to start
Aug  4 20:22:55.275: INFO: 0 / 1 pods ready in namespace 'cephcsi-e2e-b3b077f5' in daemonset 'csi-rbdplugin' (0 seconds elapsed)
Aug  4 20:22:57.280: INFO: 0 / 1 pods ready in namespace 'cephcsi-e2e-b3b077f5' in daemonset 'csi-rbdplugin' (2 seconds elapsed)
Aug  4 20:22:59.279: INFO: 1 / 1 pods ready in namespace 'cephcsi-e2e-b3b077f5' in daemonset 'csi-rbdplugin' (4 seconds elapsed)
Aug  4 20:22:59.283: INFO: ExecWithOptions {Command:[/bin/sh -c uname -a] Namespace:cephcsi-e2e-b3b077f5 PodName:csi-rbdplugin-dsnxq ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug  4 20:22:59.479: INFO: uname -a: Linux minikube 4.19.182 #1 SMP Fri Jul 2 00:45:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Aug  4 20:22:59.486: INFO: ExecWithOptions {Command:[/bin/sh -c rpm -qa | grep rbd-nbd] Namespace:cephcsi-e2e-b3b077f5 PodName:csi-rbdplugin-dsnxq ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug  4 20:23:00.247: INFO: rbd-nbd package version: rbd-nbd-16.2.5-0.el8.x86_64
Aug  4 20:23:00.458: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug  4 20:23:02.463: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-b3b077f5 PodName:csi-rbdplugin-dsnxq ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
[...]
Aug  4 20:23:22.463: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-b3b077f5 PodName:csi-rbdplugin-dsnxq ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug  4 20:23:22.677: INFO: attach command running after restart, runningAttachCmd: rbd-nbd --id cephcsi-rbd-node -m rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789 --keyfile=/tmp/csi/keys/keyfile-667049010 attach replicapool/csi-vol-4fb43356-f559-11eb-8647-566113bf90a7 --device /dev/nbd0 --try-netlink --reattach-timeout=300
Aug  4 20:23:22.682: INFO: ExecWithOptions {Command:[/bin/sh -c echo 'Hello World' > /var/lib/www/html/test] Namespace:rbd-694 PodName:csi-rbd-demo-pod ContainerName:web-server Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug  4 20:23:23.708: INFO: stdErr occurred: /bin/sh: 1: cannot create /var/lib/www/html/test: Read-only file system
Aug  4 20:23:23.708: FAIL: failed to write IO, err: command terminated with exit code 2, stdErr: /bin/sh: 1: cannot create /var/lib/www/html/test: Read-only file system

@idryomov @trociny @nixpanic @lxbsz

I had suspected that the image/connection closing at rbd-nbd as part of SIGTERM handling, but after my tests I can see that the image, the IO context, and the connection to RADOS are closed properly with sign-handler/detach/unmap, in the below order

image.close();   
io_ctx.close();  
rados.shutdown();

Will it be wise to disable rbd_cache like tcmu-runner, see - https://github.com/open-iscsi/tcmu-runner/blob/master/rbd.c#L589 ? and see if that solves it for us? (We can just pass --rbd-cache=false to rbd-nbd cli as an option)

Any other suggestions, thoughts?

trociny commented 3 years ago

@pkalever Actually I was pessimistic about your "sync" patch because it does not help much in general case (you can't guarantee an application of fs would do some more io after sync), and I don't see actually why we need to do this and I don't understand what you meant by "rbd-nbd can sync data" in the comment.

Disabling the rbd case does not look useful here too. When you run sync and it comes to the rbd it will properly flush the cache before returning the sync call.

Anyway we seem still not to have a clear understanding and only guess what is going on here. The dmesg and rbd-nbd logs could be very useful here (especially if run rbd-nbd with debug-rbd=20).

Looking at the log, I suppose the detach was run after sync, i.e. at 20:22:52:

Aug  4 20:22:52.334: INFO: ExecWithOptions {Command:[/bin/sh -c sync /var/lib/www/html] Namespace:rbd-694 PodName:csi-rbd-demo-pod ContainerName:web-server Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}

And then the attach was at about 20:23:22:

Aug  4 20:23:22.677: INFO: attach command running after restart, runningAttachCmd: rbd-nbd --id cephcsi-rbd-node -m rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789 --keyfile=/tmp/csi/keys/keyfile-667049010 attach replicapool/csi-vol-4fb43356-f559-11eb-8647-566113bf90a7 --device /dev/nbd0 --try-netlink --reattach-timeout=300

I.e. it is exactly 30 sec, and this is exactly the default rbd-nbd --io-timeout. So if there were some pending io (waiting for reattach) it could hit this timeout and generate the error. It seems like there is no much sense to have --reattach-timeout higher than --io-timeout, and I would recommend to have the same value for both.

Note, as I said it is only a guess about what was going on there.

pkalever commented 3 years ago

Hello @trociny

Thanks for looking at this, much appreciate all the help.

@pkalever Actually I was pessimistic about your "sync" patch because it does not help much in general case (you can't guarantee an application of fs would do some more io after sync), and I don't see actually why we need to do this and I don't understand what you meant by "rbd-nbd can sync data" in the comment.

The sync command was on the e2e test only fixing to temporarily avoid CI failures for a known issue. In this specific case, the application pod is running nginx and we are very sure that there won't be any IO from nginx web server pod post sync command is run.

By "rbd-nbd can sync data" I'm probably referring to the changes like quiesce on SIGTERM in rbd-nbd itself.

Disabling the rbd case does not look useful here too. When you run the sync and it comes to the rbd it will properly flush the cache before returning the sync call.

Again as mentioned sync was sent for this specific test case only. What about the general use cases, will disabling the rbd_cache benefit the data consistency in case of ungrateful kill of rbd-nbd process?

I have looked at rbd-nbd code and I see even if we disable rbd_cache the NBD_CMD_FLUSH can still perform the flush on the backend image (this must be intentional). But the wait_inflight_io will not go through as part of the quiesce.

Anyway we seem still not to have a clear understanding and only guess what is going on here. The dmesg and rbd-nbd logs could be very useful here (especially if run rbd-nbd with debug-rbd=20).

It's very unfortunate that CI at ceph-csi doesn't capture the ceph logs yet, and this is something that I'm planning to fix and hopefully, we will hit this once we have the logging in place.

Looking at the log, I suppose the detach was run after sync, i.e. at 20:22:52:

Aug  4 20:22:52.334: INFO: ExecWithOptions {Command:[/bin/sh -c sync /var/lib/www/html] Namespace:rbd-694 PodName:csi-rbd-demo-pod ContainerName:web-server Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}

And then the attach was at about 20:23:22:

Aug  4 20:23:22.677: INFO: attach command running after restart, runningAttachCmd: rbd-nbd --id cephcsi-rbd-node -m rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789 --keyfile=/tmp/csi/keys/keyfile-667049010 attach replicapool/csi-vol-4fb43356-f559-11eb-8647-566113bf90a7 --device /dev/nbd0 --try-netlink --reattach-timeout=300

I.e. it is exactly 30 sec, and this is exactly the default rbd-nbd --io-timeout. So if there were some pending io (waiting for reattach) it could hit this timeout and generate the error. It seems like there is no much sense to have --reattach-timeout higher than --io-timeout, and I would recommend to have the same value for both.

Good catch! Yes, there is a possibility.

Maybe at rbd-nbd we can improve on this, like if reattach-timeout is tweaked and io-timeout is not provided at cli, by default we can adjust io-timeout >= reattach-timeout, or at least throw a warning. I will send the changes later.

Note, as I said it is only a guess about what was going on there.

Yes, noted! But it makes very much sense to keep io-timeout >= reattach-timeout. There is nothing that we could debug further as logs are missing now.

Thanks again!

idryomov commented 3 years ago

I.e. it is exactly 30 sec, and this is exactly the default rbd-nbd --io-timeout.

@trociny What sets it? We don't have a default value for --io-timeout option in rbd-nbd (i.e. not specifying --io-timeout is interpreted as no/infinite timeout) and the kernel doesn't seem to be setting it either. The kernel would default it to 30 seconds if rbd-nbd set NBD_ATTR_TIMEOUT to 0 but we don't do that. Am I missing something?

idryomov commented 3 years ago

Or has the kernel behaviour changed across releases?

trociny commented 3 years ago

I.e. it is exactly 30 sec, and this is exactly the default rbd-nbd --io-timeout.

@trociny What sets it? We don't have a default value for --io-timeout option in rbd-nbd (i.e. not specifying --io-timeout is interpreted as no/infinite timeout) and the kernel doesn't seem to be setting it either. The kernel would default it to 30 seconds if rbd-nbd set NBD_ATTR_TIMEOUT to 0 but we don't do that. Am I missing something?

Honestly, I don't know. This is what I recall from the cases reported in the past, that when timeout was not explicitly set via option (or when we did not have this option) the kernel used 30 sec timeout. But my memory is vague here.

And currently we have this in the rbd-nbd man [1]:

Override device timeout. Linux kernel will default to a 30 second request timeout. Allow the user to optionally specify an alternate timeout.

If this is wrong it would be nice to update the man.

[1] https://docs.ceph.com/en/latest/man/8/rbd-nbd/

trociny commented 3 years ago

I have looked at rbd-nbd code and I see even if we disable rbd_cache the NBD_CMD_FLUSH can still perform the flush on the backend image (this must be intentional).

Yes, because even without the cache you still may have inflight io, and the flush just guarantees the caller that all write io that was issued before the flush is complete at the moment when the flush is returned.

pkalever commented 3 years ago

I.e. it is exactly 30 sec, and this is exactly the default rbd-nbd --io-timeout.

@trociny What sets it? We don't have a default value for --io-timeout option in rbd-nbd (i.e. not specifying --io-timeout is interpreted as no/infinite timeout) and the kernel doesn't seem to be setting it either. The kernel would default it to 30 seconds if rbd-nbd set NBD_ATTR_TIMEOUT to 0 but we don't do that. Am I missing something?

Hello @idryomov and @trociny

I'm on the latest master.

[root@linux-vm1 build]# uname -a
Linux linux-vm1 5.14.0-rc3+ #1 SMP Fri Jul 30 12:51:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[root@linux-vm1 build]# ./bin/rbd-nbd map rbd-pool/image0 --try-netlink
/dev/nbd0
[root@linux-vm1 build]# cat /sys/block/nbd0/queue/io_timeout
30000
[root@linux-vm1 build]# ./bin/rbd-nbd unmap /dev/nbd0

[root@linux-vm1 build]# ./bin/rbd-nbd map rbd-pool/image0 --try-netlink --io-timeout 60
/dev/nbd0
[root@linux-vm1 build]# cat /sys/block/nbd0/queue/io_timeout
60000
[root@linux-vm1 build]# ./bin/rbd-nbd unmap /dev/nbd0

Hence the default seems 30 seconds currently.

idryomov commented 3 years ago

The block layer indeed sets it to 30 seconds by default but I had a vague memory of nbd not following that, likely based on an email exchange with a user who had the entire dmesg filled with:

block nbd2: Possible stuck request 000000002f3b7bec: control (write@9158656,4096B). Runtime 210 seconds block nbd2: Possible stuck request 000000005b8accc5: control (write@9007104,4096B). Runtime 210 seconds

messages. They didn't touch --io-timeout (I asked) but it looks like the kernel was Ubuntu 5.4, so quite old...

When I checked nbd.c yesterday, I got confused by the logic in nbd_set_cmd_timeout() which repeats the block layer setting of 30 seconds but only when NBD_ATTR_TIMEOUT is specified. Apparently this is for the purpose of handling a reconfigure with --io-timeout 0 and for the general case the block layer default is now followed. Sorry for the noise!

pkalever commented 3 years ago

@idryomov

There is a possible bug in this area, i.e. once we alter io-timeout (say for nbd0), the sysfs will keep it persistent and use the same for future mapped images associated with that blk device (nbd0). We might have hit it many times unknowingly: https://lkml.org/lkml/2021/8/6/713

cc: @trociny @lxbsz @nixpanic

idryomov commented 3 years ago

I think most people aren't even aware of --io-timeout so the fact that it happens to be persisted isn't a big deal. Those who need a particular timeout would always specify it, overwriting the old value. But yeah, looks worth fixing to me (although I might have opted for always explicitly setting the timeout on connect rather than clearing on disconnect).

pkalever commented 3 years ago
STEP: perform IO on rbd-nbd volume after nodeplugin restart
Aug 10 07:52:11.087: INFO: waiting for kubectl (delete -f [] args %!s(MISSING)) to finish
Aug 10 07:52:11.087: INFO: Running '/usr/bin/kubectl --server=https://192.168.39.12:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-7a13656f delete -f -'
E0810 07:52:11.087500   79057 util.go:245] kernel 4.18.0-305.10.2.el8_4.x86_64 does not support required features
Aug 10 07:52:11.223: INFO: stderr: "warning: deleting cluster-scoped resources, not scoped to the provided namespace\n"
Aug 10 07:52:11.223: INFO: stdout: "storageclass.storage.k8s.io \"csi-rbd-sc\" deleted\n"
Aug 10 07:52:11.230: INFO: ExecWithOptions {Command:[/bin/sh -c ceph fsid] Namespace:rook-ceph PodName:rook-ceph-tools-7467d8bf8-c67p2 ContainerName:rook-ceph-tools Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:11.230: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:13.356: INFO: Waiting up to &PersistentVolumeClaim{ObjectMeta:{rbd-pvc  rbd-8081    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},Spec:PersistentVolumeClaimSpec{AccessModes:[ReadWriteOnce],Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{storage: {{1073741824 0} {<nil>} 1Gi BinarySI},},},VolumeName:,Selector:nil,StorageClassName:*csi-rbd-sc,VolumeMode:nil,DataSource:nil,},Status:PersistentVolumeClaimStatus{Phase:,AccessModes:[],Capacity:ResourceList{},Conditions:[]PersistentVolumeClaimCondition{},},} to be in Bound state
Aug 10 07:52:13.357: INFO: waiting for PVC rbd-pvc (0 seconds elapsed)
Aug 10 07:52:15.365: INFO: waiting for PVC rbd-pvc (2 seconds elapsed)
Aug 10 07:52:15.372: INFO: Waiting for PV pvc-500ee85c-8154-4c2c-81bf-6362dbc02eb3 to bind to PVC rbd-pvc
Aug 10 07:52:15.372: INFO: Waiting up to timeout=10m0s for PersistentVolumeClaims [rbd-pvc] to have phase Bound
Aug 10 07:52:15.375: INFO: PersistentVolumeClaim rbd-pvc found and phase=Bound (2.699166ms)
Aug 10 07:52:15.375: INFO: Waiting up to 10m0s for PersistentVolume pvc-500ee85c-8154-4c2c-81bf-6362dbc02eb3 to have phase Bound
Aug 10 07:52:15.377: INFO: PersistentVolume pvc-500ee85c-8154-4c2c-81bf-6362dbc02eb3 found and phase=Bound (2.469261ms)
Aug 10 07:52:15.390: INFO: Waiting up to csi-rbd-demo-pod to be in Running state
Aug 10 07:52:27.399: INFO: ExecWithOptions {Command:[/bin/sh -c sync /var/lib/www/html] Namespace:rbd-8081 PodName:csi-rbd-demo-pod ContainerName:web-server Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:27.399: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:27.610: INFO: ExecWithOptions {Command:[/bin/sh -c rbd ls --format=json --pool=replicapool] Namespace:rook-ceph PodName:rook-ceph-tools-7467d8bf8-c67p2 ContainerName:rook-ceph-tools Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:27.610: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:28.520: INFO: LabelSelector for csi-rbdplugin daemonsets in namespace cephcsi-e2e-7a13656f: app=csi-rbdplugin
Aug 10 07:52:28.520: INFO: waiting for kubectl ([delete po -l app=csi-rbdplugin --ignore-not-found=false] args) to finish
Aug 10 07:52:28.520: INFO: Running '/usr/bin/kubectl --server=https://192.168.39.12:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-7a13656f delete po -l app=csi-rbdplugin --ignore-not-found=false'
Aug 10 07:52:30.259: INFO: stderr: ""
Aug 10 07:52:30.259: INFO: stdout: "pod \"csi-rbdplugin-26vkl\" deleted\n"
Aug 10 07:52:30.259: INFO: Waiting up to 10m0s for all daemonsets in namespace 'cephcsi-e2e-7a13656f' to start
Aug 10 07:52:30.262: INFO: 0 / 1 pods ready in namespace 'cephcsi-e2e-7a13656f' in daemonset 'csi-rbdplugin' (0 seconds elapsed)
Aug 10 07:52:32.267: INFO: 0 / 1 pods ready in namespace 'cephcsi-e2e-7a13656f' in daemonset 'csi-rbdplugin' (2 seconds elapsed)
Aug 10 07:52:34.266: INFO: 1 / 1 pods ready in namespace 'cephcsi-e2e-7a13656f' in daemonset 'csi-rbdplugin' (4 seconds elapsed)
Aug 10 07:52:34.270: INFO: ExecWithOptions {Command:[/bin/sh -c uname -a] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:34.270: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:34.454: INFO: uname -a: Linux minikube 4.19.182 #1 SMP Fri Jul 2 00:45:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Aug 10 07:52:34.461: INFO: ExecWithOptions {Command:[/bin/sh -c rpm -qa | grep rbd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:34.461: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:35.183: INFO: rbd-nbd package version: rbd-nbd-16.2.5-0.el8.x86_64

Aug 10 07:52:35.189: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:35.189: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:35.395: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug 10 07:52:37.400: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:37.400: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:37.620: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug 10 07:52:39.399: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:39.399: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:39.632: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug 10 07:52:41.401: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:41.401: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:41.608: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug 10 07:52:43.401: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:43.401: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:43.603: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug 10 07:52:45.400: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:45.400: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:45.630: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug 10 07:52:47.399: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:47.399: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:47.605: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug 10 07:52:49.400: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:49.400: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:49.604: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug 10 07:52:51.400: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:51.400: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:51.605: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug 10 07:52:53.400: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:53.400: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:53.598: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug 10 07:52:55.401: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:55.401: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:55.618: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Aug 10 07:52:57.401: INFO: ExecWithOptions {Command:[/bin/sh -c ps -eo 'cmd' | grep [r]bd-nbd] Namespace:cephcsi-e2e-7a13656f PodName:csi-rbdplugin-dk84f ContainerName:csi-rbdplugin Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:57.401: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:57.617: INFO: attach command running after restart, runningAttachCmd: rbd-nbd --id cephcsi-rbd-node -m rook-ceph-mon-a.rook-ceph.svc.cluster.local:6789 --keyfile=/tmp/csi/keys/keyfile-084204234 attach replicapool/csi-vol-7e003901-f9a7-11eb-ae90-ee515005cb5b --device /dev/nbd0 --try-netlink --reattach-timeout=300

Aug 10 07:52:57.621: INFO: ExecWithOptions {Command:[/bin/sh -c echo 'Hello World' > /var/lib/www/html/test] Namespace:rbd-8081 PodName:csi-rbd-demo-pod ContainerName:web-server Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Aug 10 07:52:57.621: INFO: >>> kubeConfig: /root/.kube/config
Aug 10 07:52:58.973: INFO: stdErr occurred: /bin/sh: 1: cannot create /var/lib/www/html/test: Read-only file system

Aug 10 07:52:58.973: FAIL: failed to write IO, err: command terminated with exit code 2, stdErr: /bin/sh: 1: cannot create /var/lib/www/html/test: Read-only file system

Hit count +1

https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/rest/organizations/jenkins/pipelines/mini-e2e_k8s-1.21/runs/709/nodes/95/log/?start=0

idryomov commented 3 years ago

Hit count +1

Is there a dmesg for this (or any other) occurrence? I'm still not sure where the timeout is coming from.

Rakshith-R commented 3 years ago

Hit count +1

Is there a dmesg for this (or any other) occurrence? I'm still not sure where the timeout is coming from.

@idryomov , you can find dmesg logs here https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/rest/organizations/jenkins/pipelines/mini-e2e_k8s-1.20/runs/2053/nodes/104/log/?start=0 for one of the runs where this issue occured.

https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/organizations/jenkins/mini-e2e_k8s-1.20/detail/mini-e2e_k8s-1.20/2053/pipeline/104

its part of log system status step logs in jenkings pipeline.

idryomov commented 3 years ago

Thanks, Rakshith. So it indeed went through the "normal" io-timeout path:

Aug 10 10:31:48 minikube kernel: EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: discard
Aug 10 10:31:52 minikube kernel: block nbd0: Receive control failed (result -32)
Aug 10 10:32:03 minikube kernel: block nbd0: Attempted send on invalid socket
Aug 10 10:32:18 minikube kernel: block nbd0: reconnected socket
>>>>>> Aug 10 10:32:22 minikube kernel: block nbd0: Connection timed out <<<<<<
Aug 10 10:32:22 minikube kernel: block nbd0: shutting down sockets
Aug 10 10:32:22 minikube kernel: print_req_error: I/O error, dev nbd0, sector 0
Aug 10 10:32:22 minikube kernel: print_req_error: I/O error, dev nbd0, sector 1048608
Aug 10 10:32:22 minikube kernel: Aborting journal on device nbd0-8.
Aug 10 10:32:22 minikube kernel: EXT4-fs error (device nbd0) in __ext4_new_inode:985: Journal has aborted
Aug 10 10:32:22 minikube kernel: print_req_error: I/O error, dev nbd0, sector 1048576
Aug 10 10:32:22 minikube kernel: block nbd0: NBD_DISCONNECT
Aug 10 10:32:22 minikube kernel: block nbd0: Send disconnect failed -32
Aug 10 10:32:22 minikube kernel: print_req_error: I/O error, dev nbd0, sector 1048576
Aug 10 10:32:22 minikube kernel: Buffer I/O error on dev nbd0, logical block 131072, lost sync page write
Aug 10 10:32:22 minikube kernel: JBD2: Error -5 detected when updating journal superblock for nbd0-8.
Aug 10 10:32:22 minikube kernel: print_req_error: I/O error, dev nbd0, sector 0
Aug 10 10:32:22 minikube kernel: print_req_error: I/O error, dev nbd0, sector 0
Aug 10 10:32:22 minikube kernel: Buffer I/O error on dev nbd0, logical block 0, lost sync page write
Aug 10 10:32:22 minikube kernel: EXT4-fs (nbd0): I/O error while writing superblock
Aug 10 10:32:22 minikube kernel: EXT4-fs error (device nbd0) in ext4_create:2530: Journal has aborted
Aug 10 10:32:22 minikube kernel: print_req_error: I/O error, dev nbd0, sector 0
Aug 10 10:32:22 minikube kernel: print_req_error: I/O error, dev nbd0, sector 0
Aug 10 10:32:22 minikube kernel: Buffer I/O error on dev nbd0, logical block 0, lost sync page write
Aug 10 10:32:22 minikube kernel: EXT4-fs (nbd0): I/O error while writing superblock

But this same code seems to be responsible for resubmitting requests to the new rbd-nbd instance after a restart, so setting io-timeout to the same value as reattach-timeout (which is really a dead socket timeout) is probably not a good idea...

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

pkalever commented 3 years ago

To address this, we have alredy set io-timeout=0 by default, but as @idryomov rightly mentioned at https://github.com/ceph/ceph-csi/pull/2394#issuecomment-903636068

Kernel 4.19.182 doesn't handle io-timeout=0 correctly because it lacks torvalds/linux@2da22da. In general, the nbd driver in older kernels is quite broken so you may need to constrain nbd tests to e.g. CentOS 8.4.

I'm hoping for kernels without torvalds/linux@2da22da we should use the quiesce on signal mechanism https://github.com/ceph/ceph/pull/42609

cc: @nixpanic @Madhu-1

pkalever commented 2 years ago

To address this, we have alredy set io-timeout=0 by default, but as @idryomov rightly mentioned at #2394 (comment)

Kernel 4.19.182 doesn't handle io-timeout=0 correctly because it lacks torvalds/linux@2da22da. In general, the nbd driver in older kernels is quite broken so you may need to constrain nbd tests to e.g. CentOS 8.4.

I'm hoping for kernels without torvalds/linux@2da22da we should use the quiesce on signal mechanism ceph/ceph#42609

cc: @nixpanic @Madhu-1

@nixpanic @Madhu-1 @humblec can we please take a call on this?

quentinleclerc commented 1 year ago

Hello,

I'm not sure i'm hitting the same case that this issue is talking about but I have some volumes mounted as nbd that are mounted as read-only on some pod even though they are mounted as ReadWriteOnce and without read-only option. I'm running ceph octopus and csi-driver 3.7.2.

Does anyone know if I'm in this case, and if any workaround exists?

Thanks

Madhu-1 commented 1 year ago

@quentinleclerc please check do you have clients blocklisted on the ceph cluster and also check dmesg on the node where you have this problem. the workaround is restarting the node for now.

quentinleclerc commented 1 year ago

Thanks @Madhu-1 indeed my node was spamming with "libceph: auth method 'x' error -13" and others I/O errors. A restart did solve the problem (for now)

idryomov commented 1 year ago

spamming with "libceph: auth method 'x' error -13" and others I/O errors

This means that you were (perhaps at some point in the past?) using the krbd mounter, not the rbd-nbd mounter.

quentinleclerc commented 1 year ago

Yes indeed at first I had the storage class running with default options but some of my pods had incompatibility with krbd and thus I activated the tryOtherMounters option to fallback to nbd.

parameters: 
  tryOtherMounters: "true"