pratik705 commented 1 year ago

Describe the bug

Existing volumes are not usable after upgrading ceph-csi from 3.5.1 --> 3.6.1.

Environment details

Image/version of Ceph CSI driver : v3.6.1

image: k8s.gcr.io/sig-storage/csi-provisioner:v3.1.0
image: k8s.gcr.io/sig-storage/csi-resizer:v1.4.0
image: k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0
image: k8s.gcr.io/sig-storage/csi-attacher:v3.4.0
image: quay.io/cephcsi/cephcsi:v3.6.1
image: quay.io/cephcsi/cephcsi:v3.6.1
image: quay.io/cephcsi/cephcsi:v3.6.1
image: k8s.gcr.io/sig-storage/csi-attacher:v3.4.0
image: k8s.gcr.io/sig-storage/csi-provisioner:v3.1.0
image: quay.io/cephcsi/cephcsi:v3.6.1
image: quay.io/cephcsi/cephcsi:v3.6.1
image: k8s.gcr.io/sig-storage/csi-resizer:v1.4.0
image: k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0
image: quay.io/cephcsi/cephcsi:v3.6.1

image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
image: quay.io/cephcsi/cephcsi:v3.6.1
image: quay.io/cephcsi/cephcsi:v3.6.1
image: quay.io/cephcsi/cephcsi:v3.6.1
image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.4.0
image: quay.io/cephcsi/cephcsi:v3.6.1

Helm chart version : 3.6.1
Kernel version : 4.15.0-202-generic

Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its krbd or rbd-nbd) :

# (optional) uncomment the following to use rbd-nbd as mounter
# on supported nodes
# mounter: rbd-nbd
mounter: ""

Kubernetes cluster version : v1.21.3
Ceph cluster version : 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable)

Steps to reproduce

Steps to reproduce the behavior:

Deploy ceph-csi 3.5.1
Create pvc and attached to the pod
login to the pod and write some data on pvc
Upgrade ceph-csi to 3.61
Try accessing the same data created in step 3
The process will hang
Try creating new volume and attach to some pod
The operation will be successful and you can access the data

Actual results

Unable to access the existing volumes after upgrading ceph-csi from 3.5.1. to 3.6.1
The issue is only with existing volumes

Expected behavior

We should be able to access the data from the existing volumes after upgrading ceph-csi from 3.5.1. to 3.6.1

Logs

If the issue is in PVC mounting please attach complete logs of below containers.

csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from plugin pod from the node where the mount is failing.

I0222 11:16:38.943679 1888276 utils.go:195] ID: 3120 GRPC request: {}
I0222 11:16:38.943712 1888276 utils.go:202] ID: 3120 GRPC response: {}
I0222 11:17:38.943581 1888276 utils.go:191] ID: 3121 GRPC call: /csi.v1.Identity/Probe
I0222 11:17:38.943695 1888276 utils.go:195] ID: 3121 GRPC request: {}
I0222 11:17:38.943738 1888276 utils.go:202] ID: 3121 GRPC response: {}
I0222 11:17:50.762117 1888276 utils.go:191] ID: 3122 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0222 11:17:50.762174 1888276 utils.go:195] ID: 3122 GRPC request: {}
I0222 11:17:50.762266 1888276 utils.go:202] ID: 3122 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0222 11:17:50.763237 1888276 utils.go:191] ID: 3123 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0222 11:17:50.763308 1888276 utils.go:195] ID: 3123 GRPC request: {"volume_id":"0001-0024-4a2595f1-4594-4785-ae28-56aaad349dd4-0000000000000004-6a40d3ff-b29d-11ed-90eb-0ebd7e4d5f35","volume_path":"/var/lib/kubelet/pods/b5df66f1-3bd9-4944-903f-07b119f4ff38/volumes/kubernetes.io~csi/pvc-c2c608c1-9204-4b52-b420-985044fa39ad/mount"}
I0222 11:17:50.769035 1888276 mount_linux.go:218] Cannot run systemd-run, assuming non-systemd OS
I0222 11:17:50.769056 1888276 mount_linux.go:219] systemd-run output: System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
, failed with: exit status 1
I0222 11:17:50.769169 1888276 utils.go:202] ID: 3123 GRPC response: {"usage":[{"available":11499909120,"total":11516715008,"unit":1,"used":28672},{"available":720884,"total":720896,"unit":2,"used":12}]}
I0222 11:18:30.666278 1888276 utils.go:191] ID: 3124 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0222 11:18:30.666342 1888276 utils.go:195] ID: 3124 GRPC request: {}
I0222 11:18:30.666447 1888276 utils.go:202] ID: 3124 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0222 11:18:30.667092 1888276 utils.go:191] ID: 3125 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0222 11:18:30.667156 1888276 utils.go:195] ID: 3125 GRPC request: {"volume_id":"0001-0024-4a2595f1-4594-4785-ae28-56aaad349dd4-0000000000000004-59103359-b170-11ed-8450-0a07548e3c2a","volume_path":"/var/lib/kubelet/pods/886a3878-d155-4ef8-a50c-912872475296/volumes/kubernetes.io~csi/pvc-b0c0fd1e-cf5b-4a89-9e1e-c0c1653041cc/mount"}
I0222 11:18:30.672931 1888276 mount_linux.go:218] Cannot run systemd-run, assuming non-systemd OS
I0222 11:18:30.672947 1888276 mount_linux.go:219] systemd-run output: System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
, failed with: exit status 1
I0222 11:18:30.673061 1888276 utils.go:202] ID: 3125 GRPC response: {"usage":[{"available":7281311744,"total":7298117632,"unit":1,"used":28672},{"available":458736,"total":458752,"unit":2,"used":16}]}
I0222 11:18:38.953594 1888276 utils.go:191] ID: 3126 GRPC call: /csi.v1.Identity/Probe
I0222 11:18:38.953667 1888276 utils.go:195] ID: 3126 GRPC request: {}
I0222 11:18:38.953693 1888276 utils.go:202] ID: 3126 GRPC response: {}
I0222 11:19:01.085028 1888276 utils.go:191] ID: 3127 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0222 11:19:01.085089 1888276 utils.go:195] ID: 3127 GRPC request: {}
I0222 11:19:01.085203 1888276 utils.go:202] ID: 3127 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0222 11:19:01.085866 1888276 utils.go:191] ID: 3128 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0222 11:19:01.085907 1888276 utils.go:195] ID: 3128 GRPC request: {"volume_id":"0001-0024-4a2595f1-4594-4785-ae28-56aaad349dd4-0000000000000004-6a40d3ff-b29d-11ed-90eb-0ebd7e4d5f35","volume_path":"/var/lib/kubelet/pods/b5df66f1-3bd9-4944-903f-07b119f4ff38/volumes/kubernetes.io~csi/pvc-c2c608c1-9204-4b52-b420-985044fa39ad/mount"}
I0222 11:19:01.092005 1888276 mount_linux.go:218] Cannot run systemd-run, assuming non-systemd OS
I0222 11:19:01.092025 1888276 mount_linux.go:219] systemd-run output: System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
, failed with: exit status 1
I0222 11:19:01.092152 1888276 utils.go:202] ID: 3128 GRPC response: {"usage":[{"available":11499909120,"total":11516715008,"unit":1,"used":28672},{"available":720884,"total":720896,"unit":2,"used":12}]}
I0222 11:19:38.943203 1888276 utils.go:191] ID: 3129 GRPC call: /csi.v1.Identity/Probe
I0222 11:19:38.943267 1888276 utils.go:195] ID: 3129 GRPC request: {}
I0222 11:19:38.943299 1888276 utils.go:202] ID: 3129 GRPC response: {}
I0222 11:19:53.543160 1888276 utils.go:191] ID: 3130 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0222 11:19:53.543570 1888276 utils.go:195] ID: 3130 GRPC request: {}
I0222 11:19:53.543943 1888276 utils.go:202] ID: 3130 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0222 11:19:53.544576 1888276 utils.go:191] ID: 3131 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0222 11:19:53.544623 1888276 utils.go:195] ID: 3131 GRPC request: {"volume_id":"0001-0024-4a2595f1-4594-4785-ae28-56aaad349dd4-0000000000000004-59103359-b170-11ed-8450-0a07548e3c2a","volume_path":"/var/lib/kubelet/pods/886a3878-d155-4ef8-a50c-912872475296/volumes/kubernetes.io~csi/pvc-b0c0fd1e-cf5b-4a89-9e1e-c0c1653041cc/mount"}
I0222 11:19:53.550215 1888276 mount_linux.go:218] Cannot run systemd-run, assuming non-systemd OS
I0222 11:19:53.550226 1888276 mount_linux.go:219] systemd-run output: System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
, failed with: exit status 1
I0222 11:19:53.550303 1888276 utils.go:202] ID: 3131 GRPC response: {"usage":[{"available":7281311744,"total":7298117632,"unit":1,"used":28672},{"available":458736,"total":458752,"unit":2,"used":16}]}
I0222 11:20:38.963697 1888276 utils.go:191] ID: 3132 GRPC call: /csi.v1.Identity/Probe
I0222 11:20:38.963767 1888276 utils.go:195] ID: 3132 GRPC request: {}
I0222 11:20:38.963786 1888276 utils.go:202] ID: 3132 GRPC response: {}
I0222 11:20:58.507502 1888276 utils.go:191] ID: 3133 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0222 11:20:58.507563 1888276 utils.go:195] ID: 3133 GRPC request: {}
I0222 11:20:58.507658 1888276 utils.go:202] ID: 3133 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0222 11:20:58.508333 1888276 utils.go:191] ID: 3134 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0222 11:20:58.508390 1888276 utils.go:195] ID: 3134 GRPC request: {"volume_id":"0001-0024-4a2595f1-4594-4785-ae28-56aaad349dd4-0000000000000004-6a40d3ff-b29d-11ed-90eb-0ebd7e4d5f35","volume_path":"/var/lib/kubelet/pods/b5df66f1-3bd9-4944-903f-07b119f4ff38/volumes/kubernetes.io~csi/pvc-c2c608c1-9204-4b52-b420-985044fa39ad/mount"}
I0222 11:20:58.514611 1888276 mount_linux.go:218] Cannot run systemd-run, assuming non-systemd OS
I0222 11:20:58.514632 1888276 mount_linux.go:219] systemd-run output: System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
, failed with: exit status 1
I0222 11:20:58.514763 1888276 utils.go:202] ID: 3134 GRPC response: {"usage":[{"available":11499909120,"total":11516715008,"unit":1,"used":28672},{"available":720884,"total":720896,"unit":2,"used":12}]}
I0222 11:21:13.641032 1888276 utils.go:191] ID: 3135 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0222 11:21:13.641085 1888276 utils.go:195] ID: 3135 GRPC request: {}
I0222 11:21:13.641204 1888276 utils.go:202] ID: 3135 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0222 11:21:13.641911 1888276 utils.go:191] ID: 3136 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0222 11:21:13.641971 1888276 utils.go:195] ID: 3136 GRPC request: {"volume_id":"0001-0024-4a2595f1-4594-4785-ae28-56aaad349dd4-0000000000000004-59103359-b170-11ed-8450-0a07548e3c2a","volume_path":"/var/lib/kubelet/pods/886a3878-d155-4ef8-a50c-912872475296/volumes/kubernetes.io~csi/pvc-b0c0fd1e-cf5b-4a89-9e1e-c0c1653041cc/mount"}
I0222 11:21:13.647571 1888276 mount_linux.go:218] Cannot run systemd-run, assuming non-systemd OS
I0222 11:21:13.647582 1888276 mount_linux.go:219] systemd-run output: System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
, failed with: exit status 1
I0222 11:21:13.647664 1888276 utils.go:202] ID: 3136 GRPC response: {"usage":[{"available":7281311744,"total":7298117632,"unit":1,"used":28672},{"available":458736,"total":458752,"unit":2,"used":16}]}
I0222 11:21:38.943386 1888276 utils.go:191] ID: 3137 GRPC call: /csi.v1.Identity/Probe
I0222 11:21:38.943445 1888276 utils.go:195] ID: 3137 GRPC request: {}
I0222 11:21:38.943465 1888276 utils.go:202] ID: 3137 GRPC response: {}
I0222 11:22:38.943162 1888276 utils.go:191] ID: 3138 GRPC call: /csi.v1.Identity/Probe
I0222 11:22:38.943550 1888276 utils.go:195] ID: 3138 GRPC request: {}
I0222 11:22:38.943634 1888276 utils.go:202] ID: 3138 GRPC response: {}
I0222 11:22:56.406869 1888276 utils.go:191] ID: 3139 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0222 11:22:56.406921 1888276 utils.go:195] ID: 3139 GRPC request: {}
I0222 11:22:56.407013 1888276 utils.go:202] ID: 3139 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0222 11:22:56.407623 1888276 utils.go:191] ID: 3140 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0222 11:22:56.407665 1888276 utils.go:195] ID: 3140 GRPC request: {"volume_id":"0001-0024-4a2595f1-4594-4785-ae28-56aaad349dd4-0000000000000004-59103359-b170-11ed-8450-0a07548e3c2a","volume_path":"/var/lib/kubelet/pods/886a3878-d155-4ef8-a50c-912872475296/volumes/kubernetes.io~csi/pvc-b0c0fd1e-cf5b-4a89-9e1e-c0c1653041cc/mount"}
I0222 11:22:56.413600 1888276 mount_linux.go:218] Cannot run systemd-run, assuming non-systemd OS
I0222 11:22:56.413649 1888276 mount_linux.go:219] systemd-run output: System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
, failed with: exit status 1
I0222 11:22:56.413814 1888276 utils.go:202] ID: 3140 GRPC response: {"usage":[{"available":7281311744,"total":7298117632,"unit":1,"used":28672},{"available":458736,"total":458752,"unit":2,"used":16}]}
I0222 11:22:57.676048 1888276 utils.go:191] ID: 3141 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0222 11:22:57.676103 1888276 utils.go:195] ID: 3141 GRPC request: {}
I0222 11:22:57.676191 1888276 utils.go:202] ID: 3141 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0222 11:22:57.676770 1888276 utils.go:191] ID: 3142 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0222 11:22:57.676805 1888276 utils.go:195] ID: 3142 GRPC request: {"volume_id":"0001-0024-4a2595f1-4594-4785-ae28-56aaad349dd4-0000000000000004-6a40d3ff-b29d-11ed-90eb-0ebd7e4d5f35","volume_path":"/var/lib/kubelet/pods/b5df66f1-3bd9-4944-903f-07b119f4ff38/volumes/kubernetes.io~csi/pvc-c2c608c1-9204-4b52-b420-985044fa39ad/mount"}
I0222 11:22:57.682645 1888276 mount_linux.go:218] Cannot run systemd-run, assuming non-systemd OS
I0222 11:22:57.682669 1888276 mount_linux.go:219] systemd-run output: System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to create bus connection: Host is down
, failed with: exit status 1
I0222 11:22:57.682796 1888276 utils.go:202] ID: 3142 GRPC response: {"usage":[{"available":11499909120,"total":11516715008,"unit":1,"used":28672},{"available":720884,"total":720896,"unit":2,"used":12}]}

if required attach dmesg logs.

Feb 21 23:23:03 rpck-ir16 kernel: libceph: connect 172.22.0.149:6789 error -101
Feb 21 23:23:03 rpck-ir16 kernel: libceph: mon2 172.22.0.149:6789 connect error
Feb 21 23:23:12 rpck-ir16 kernel: libceph: connect 172.22.0.149:6789 error -101
Feb 21 23:23:12 rpck-ir16 kernel: libceph: mon2 172.22.0.149:6789 connect error
Feb 21 23:23:26 rpck-ir16 kernel: libceph: connect 172.22.0.147:6789 error -101
Feb 21 23:23:26 rpck-ir16 kernel: libceph: mon0 172.22.0.147:6789 connect error
Feb 21 23:23:26 rpck-ir16 kernel: libceph: connect 172.22.0.147:6789 error -101
Feb 21 23:23:26 rpck-ir16 kernel: libceph: mon0 172.22.0.147:6789 connect error
Feb 21 23:23:27 rpck-ir16 kernel: libceph: connect 172.22.0.147:6789 error -101
Feb 21 23:23:27 rpck-ir16 kernel: libceph: mon0 172.22.0.147:6789 connect error
Feb 21 23:23:29 rpck-ir16 kernel: libceph: connect 172.22.0.147:6789 error -101
Feb 21 23:23:29 rpck-ir16 kernel: libceph: mon0 172.22.0.147:6789 connect error
Feb 21 23:23:34 rpck-ir16 kernel: libceph: connect 172.22.0.147:6789 error -101
Feb 21 23:23:34 rpck-ir16 kernel: libceph: mon0 172.22.0.147:6789 connect error
Feb 21 23:23:34 rpck-ir16 kernel: libceph: connect 172.22.0.148:6810 error -101
Feb 21 23:23:34 rpck-ir16 kernel: libceph: osd6 172.22.0.148:6810 connect error
Feb 21 23:23:42 rpck-ir16 kernel: libceph: connect 172.22.0.147:6789 error -101
Feb 21 23:23:42 rpck-ir16 kernel: libceph: mon0 172.22.0.147:6789 connect error
Feb 21 23:23:57 rpck-ir16 kernel: libceph: connect 172.22.0.149:6789 error -101
Feb 21 23:23:57 rpck-ir16 kernel: libceph: mon2 172.22.0.149:6789 connect error

Note:- If its a rbd issue please provide only rbd related logs, if its a cephFS issue please provide cephFS logs.

Additional context

Add any other context about the problem here.

Though we can see the connect errors to the mons/osd's, I verified the connection and its successful:

root@rpck-ir16:/var/log/ceph# nc -vz 172.22.0.149 6789
Connection to 172.22.0.149 6789 port [tcp/*] succeeded!
root@rpck-ir16:/var/log/ceph# nc -vz 172.22.0.148 6810
Connection to 172.22.0.148 6810 port [tcp/*] succeeded!

Ceph status:

# ceph -s
cluster:
id:     4a2595f1-4594-4785-ae28-56aaad349dd4
health: HEALTH_WARN
        2 pool(s) do not have an application enabled

services:
mon: 3 daemons, quorum rpck-ir14,rpck-ir15,rpck-ir16 (age 39h)
mgr: rpck-ir15(active, since 40h), standbys: rpck-ir14, rpck-ir16
osd: 12 osds: 12 up (since 39h), 12 in (since 4M)

data:
pools:   5 pools, 577 pgs
objects: 54.54k objects, 197 GiB
usage:   590 GiB used, 4.7 TiB / 5.2 TiB avail
pgs:     577 active+clean

io:
client:   0 B/s rd, 41 KiB/s wr, 0 op/s rd, 2 op/s wr

Output of the pod:

root@rpck-ir14:~# kubectl exec -it test-nginx -n test-velero -- bash root@test-nginx:/# df -h Filesystem Size Used Avail Use% Mounted on overlay 435G 85G 350G 20% / tmpfs 64M 0 64M 0% /dev tmpfs 95G 0 95G 0% /sys/fs/cgroup shm 64M 0 64M 0% /dev/shm /dev/mapper/vglocal00-root00 435G 85G 350G 20% /etc/hosts /dev/rbd0 6.8G 28K 6.8G 1% /usr/share/nginx/html tmpfs 95G 12K 95G 1% /run/secrets/kubernetes.io/serviceaccount tmpfs 95G 0 95G 0% /proc/acpi tmpfs 95G 0 95G 0% /proc/scsi tmpfs 95G 0 95G 0% /sys/firmware root@test-nginx:/# cd /usr/share/nginx/html root@test-nginx:/usr/share/nginx/html# ls

^^ hung

On the worker node where the pod is running, the processes are in "D" state:

root 2011442 0.0 0.0 3444 672 ? D+ Feb21 0:00 ls root 2044814 0.0 0.0 3444 728 ? D+ Feb21 0:00 ls root 2055402 0.0 0.0 3524 2340 ? D+ Feb21 0:00 ls -ltr /usr/share/nginx/html

Madhu-1 commented 1 year ago

@pratik705 101 error looks like a connection issue with mount/map and ceph cluster. Can you restart the node and see if it fixes the problem?

pratik705 commented 1 year ago

Thanks for the reply, @Madhu-1

I can try restarting the worker node. But, I am able to create new pod/volume on the same node from the same CEPH backend[1]. Also, from the same node, I am able to connect the mon/osds. All existing pods running on different nodes are stuck due to this issue. You still want me to restart the node?

[1]

root@rpck-ir14:/var/log/ceph# kubectl get pods -n test-velero -o wide
NAME          READY   STATUS    RESTARTS   AGE   IP           NODE           NOMINATED NODE   READINESS GATES
alpha-nginx   1/1     Running   0          77m   10.20.2.17   172.22.0.149   <none>           <none> <=== new pod(ceph-csi 3.6.1)
test-nginx    1/1     Running   0          37h   10.20.2.43   172.22.0.149   <none>           <none>  <=== existing pod(ceph-csi 3.5.1)

root@rpck-ir14:/var/log/ceph# kubectl exec -it alpha-nginx -n test-velero -- bash
root@alpha-nginx:/# df -h
Filesystem                    Size  Used Avail Use% Mounted on
overlay                       435G   85G  350G  20% /
tmpfs                          64M     0   64M   0% /dev
tmpfs                          95G     0   95G   0% /sys/fs/cgroup
/dev/rbd1                      11G   28K   11G   1% /mnt                <<===
/dev/mapper/vglocal00-root00  435G   85G  350G  20% /etc/hosts
shm                            64M     0   64M   0% /dev/shm
tmpfs                          95G   12K   95G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                          95G     0   95G   0% /proc/acpi
tmpfs                          95G     0   95G   0% /proc/scsi
tmpfs                          95G     0   95G   0% /sys/firmware
root@alpha-nginx:/# cd /mnt
root@alpha-nginx:/mnt# ls
abc  lost+found
root@alpha-nginx:/mnt# echo "this is new file with ceph-csi v3.6.1" >new-file.txt
root@alpha-nginx:/mnt# ls
abc  lost+found  new-file.txt
root@alpha-nginx:/mnt# cat new-file.txt
this is new file with ceph-csi v3.6.1

root@rpck-ir16:/var/log/ceph# nc -vz 172.22.0.148 6810
Connection to 172.22.0.148 6810 port [tcp/*] succeeded!
root@rpck-ir16:/var/log/ceph# nc -vz 172.22.0.149 6789
Connection to 172.22.0.149 6789 port [tcp/*] succeeded!

Madhu-1 commented 1 year ago

@pratik705 yes please restart the node where application pod is running or scale down all the applications and wait for all the applications to be down and scale it back again.

pratik705 commented 1 year ago

@Madhu-1 it helped. I am able to access the pods and data. Thanks a lot for the workaround :-)

Is it a bug in the upgrade process?

Madhu-1 commented 1 year ago

its not a bug its a connection problem, the clients might have not reconnected to the ceph cluster. i have not seen this in any upgraded cluster.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

ceph / ceph-csi

Existing volumes are not usable after upgrading ceph-csi from 3.5.1 --> 3.6.1 #3687

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior

Logs

Additional context

On the worker node where the pod is running, the processes are in "D" state: