Closed Paxa closed 4 years ago
👋
Which version of CSI are you running?
Also, are you able to reconstruct what kind of operations / actions led to this state? Specifically, did you possibly force-delete a Kubernetes object at any point?
A common error is that a volume is already used by another pod running on a different node. DigitalOcean Block Storage supports RWO only, so any pod trying to reference a PVC already attached on a different node is bound to hang indefinitely.
I attached 2 volumes to pod (move database to bigger volume) and after copying content I updated deployment to use only new pvc, then pod took too long to terminate so I force-delete it.
My cluster have only 1 node
Image: quay.io/k8scsi/csi-node-driver-registrar:v1.2.0 and digitalocean/do-csi-plugin:v1.1.2
I tried to mount volume manually to VM via DO dashboard and it worked
Having same issue again, can not terminate pod:
csi driver logs
time="2019-12-14T17:55:29Z" level=info msg="target path is already unmounted" method=node_unpublish_volume node_id=170430134 region=sgp1 target_path="/var/lib/kubelet/pods/ae47eda8-5c18-4572-ba2f-3a1bc8f2b277/volumes/kubernetes.io~csi/pvc-2864e37b-5029-48e1-a86e-ec3b7819104d/mount" version=v1.1.2 volume_id=17e306bd-1912-11ea-bc02-0a58ac14a1f9
time="2019-12-14T17:55:29Z" level=info msg="unmounting volume is finished" method=node_unpublish_volume node_id=170430134 region=sgp1 target_path="/var/lib/kubelet/pods/ae47eda8-5c18-4572-ba2f-3a1bc8f2b277/volumes/kubernetes.io~csi/pvc-2864e37b-5029-48e1-a86e-ec3b7819104d/mount" version=v1.1.2 volume_id=17e306bd-1912-11ea-bc02-0a58ac14a1f9
time="2019-12-14T17:57:31Z" level=info msg="node unpublish volume called" method=node_unpublish_volume node_id=170430134 region=sgp1 target_path="/var/lib/kubelet/pods/ae47eda8-5c18-4572-ba2f-3a1bc8f2b277/volumes/kubernetes.io~csi/pvc-2864e37b-5029-48e1-a86e-ec3b7819104d/mount" version=v1.1.2 volume_id=17e306bd-1912-11ea-bc02-0a58ac14a1f9
time="2019-12-14T17:57:31Z" level=info msg="checking if target is mounted" args="[-o TARGET,PROPAGATION,FSTYPE,OPTIONS -M /var/lib/kubelet/pods/ae47eda8-5c18-4572-ba2f-3a1bc8f2b277/volumes/kubernetes.io~csi/pvc-2864e37b-5029-48e1-a86e-ec3b7819104d/mount -J]" cmd=findmnt node_id=170430134 region=sgp1 version=v1.1.2
time="2019-12-14T17:57:31Z" level=info msg="target path is already unmounted" method=node_unpublish_volume node_id=170430134 region=sgp1 target_path="/var/lib/kubelet/pods/ae47eda8-5c18-4572-ba2f-3a1bc8f2b277/volumes/kubernetes.io~csi/pvc-2864e37b-5029-48e1-a86e-ec3b7819104d/mount" version=v1.1.2 volume_id=17e306bd-1912-11ea-bc02-0a58ac14a1f9
time="2019-12-14T17:57:31Z" level=info msg="unmounting volume is finished" method=node_unpublish_volume node_id=170430134 region=sgp1 target_path="/var/lib/kubelet/pods/ae47eda8-5c18-4572-ba2f-3a1bc8f2b277/volumes/kubernetes.io~csi/pvc-2864e37b-5029-48e1-a86e-ec3b7819104d/mount" version=v1.1.2 volume_id=17e306bd-1912-11ea-bc02-0a58ac14a1f9
time="2019-12-14T17:59:33Z" level=info msg="node unpublish volume called" method=node_unpublish_volume node_id=170430134 region=sgp1 target_path="/var/lib/kubelet/pods/ae47eda8-5c18-4572-ba2f-3a1bc8f2b277/volumes/kubernetes.io~csi/pvc-2864e37b-5029-48e1-a86e-ec3b7819104d/mount" version=v1.1.2 volume_id=17e306bd-1912-11ea-bc02-0a58ac14a1f9
time="2019-12-14T17:59:33Z" level=info msg="checking if target is mounted" args="[-o TARGET,PROPAGATION,FSTYPE,OPTIONS -M /var/lib/kubelet/pods/ae47eda8-5c18-4572-ba2f-3a1bc8f2b277/volumes/kubernetes.io~csi/pvc-2864e37b-5029-48e1-a86e-ec3b7819104d/mount -J]" cmd=findmnt node_id=170430134 region=sgp1 version=v1.1.2
time="2019-12-14T17:59:33Z" level=info msg="target path is already unmounted" method=node_unpublish_volume node_id=170430134 region=sgp1 target_path="/var/lib/kubelet/pods/ae47eda8-5c18-4572-ba2f-3a1bc8f2b277/volumes/kubernetes.io~csi/pvc-2864e37b-5029-48e1-a86e-ec3b7819104d/mount" version=v1.1.2 volume_id=17e306bd-1912-11ea-bc02-0a58ac14a1f9
time="2019-12-14T17:59:33Z" level=info msg="unmounting volume is finished" method=node_unpublish_volume node_id=170430134 region=sgp1 target_path="/var/lib/kubelet/pods/ae47eda8-5c18-4572-ba2f-3a1bc8f2b277/volumes/kubernetes.io~csi/pvc-2864e37b-5029-48e1-a86e-ec3b7819104d/mount" version=v1.1.2 volume_id=17e306bd-1912-11ea-bc02-0a58ac14a1f9
describe pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 12m kubelet, garuda-staging-l03r Stopping container rabbitmq
Warning Unhealthy 12m kubelet, garuda-staging-l03r Readiness probe failed:
Warning Unhealthy 12m kubelet, garuda-staging-l03r Readiness probe failed: cannot exec in a stopped state: unknown
After restarting csi pod, it generates same logs
I'd guess the volume unmount request happens repeatedly as part of a larger control flow that's failing.
Could you share the logs from the sidecars as well as the driver controller part? (If this is a DOKS cluster, then you can also mail or Slack-DM me your cluster ID.)
@Paxa I discovered that our CSI driver misses to return a proper error code in one case which I could also observe in your issue. A fixing PR has been filed and is going to ship with the next release of our driver.
Not entirely sure though if that's the sole reason for your issues; let's continue to investigate.
We have a similar error. If we move pods to a different node we often see Unable to attach or mount volumes
.
@tobinski would you mind sharing the same information as https://github.com/digitalocean/csi-digitalocean/issues/242#issuecomment-565755964? Thanks.
Do you have images for master? I could try and see if problem is solved
@Paxa unfortunately, what I thought to be the issue turned out to be an ambiguous interpretation of what the CSI specification expects: the presumed fix should not be related to what was observed on your cluster. (For posterity: #246 reverted the change I did again as it's not clearly needed.)
If you could respond to the email I sent you a couple of days ago, that'd be great as it'd allow us to further investigate the issue in your case.
@tobinski's issue seems slightly different: it looks like the problem that #221 addressed, which is going to be included in our next CSI driver release.
Issues were resolved externally. Closing.
All PVC works fine, except 1 pod
K8s v1.16.2
descrive pod:
From syslog:
Tried to restart csi pod, kubelet, docker - no luck