Deleted node is not able to attach to existing PVC once is created again

connde commented 4 years ago

What did you do? (required. The issue will be closed when not provided.)

Deleted a random node in my Rancher cluster to see how Percona Xtradb cluster behaved

What did you expect to happen?

Node to be recreated and Percona node attached to existing PVC

Configuration (MUST fill this out):

system logs:

Please provide the following logs:


kubectl cluster-info dump > kubernetes-dump.log

This will output everthing from your cluster. Please use a private gist via https://gist.github.com/ to share this dump with us Not able to create a gist, is generating an error on the site but happy to send to an email if needed.

manifests, such as pvc, deployments, etc.. you used to reproduce: Deployed Percona using OperatorHub.io cr.zip

Please provide the total set of manifests that are needed to reproduce the issue. Just providing the pvc is not helpful. If you cannot provide it due privacy concerns, please try creating a reproducible case.

CSI Version: https://github.com/digitalocean/csi-digitalocean/tree/master/deploy/kubernetes/releases/csi-digitalocean-latest
Kubernetes Version: 1.18.3
Cloud provider/framework version, if applicable (such as Rancher): RancherOS 2.4.5 -> DigitalOcean -> 3 nodes Not using DOKS.

Normal Scheduled 76s default-scheduler Successfully assigned my-percona-xtradb-cluster-operator/cluster-01-pxc-2 to worker-pool2 Normal SuccessfulAttachVolume 76s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-e801f45f-3ac1-4d5e-8ce5-dc2a79191992" Warning FailedMount 28s (x7 over 60s) kubelet, worker-pool2 MountVolume.MountDevice failed for volume "pvc-e801f45f-3ac1-4d5e-8ce5-dc2a79191992" : rpc error: code = Internal desc = formatting disk failed: exit status 1 cmd: 'mkfs.ext4 -F /dev/disk/by-id/scsi-0DO_Volume_pvc-e801f45f-3ac1-4d5e-8ce5-dc2a79191992' output: "mke2fs 1.45.5 (07-Jan-2020)\nThe file /dev/disk/by-id/scsi-0DO_Volume_pvc-e801f45f-3ac1-4d5e-8ce5-dc2a79191992 does not exist and no size was specified.\n"

Hi, to reproduce create a Percona operator than a CR with 3 nodes, after cluster is running delete a node manually and wait for recreation. Volume will not bind correctly.

If I manually attach the volume in DO dashboard and terminate the pod the new pod gets created correctly.

Any help is appreciated.

timoreimann commented 4 years ago

I just tested this on DOKS by directly deleting the droplet hosting a PVC-using pod (managed by a StatefulSet). After the node removal was detected (by our cloud-controller-manager component), the Node object was removed and the workload transferred to a different node, along with the PVC.

To clarify: did you delete the droplet or just the Node object in the cluster?

connde commented 4 years ago

Hi @timoreimann , I'm NOT using DOKS, using RancherOS and deploying the nodes to droplets.

I deleted the node from Rancher UI, it got deleted and created correctly as expected but the PVC did not get attached.

timoreimann commented 4 years ago

@connde thanks. Understood you're not on DOKS -- the behavior should be identical though: as soon as the control plane detects that a node is gone, the workload should be moved elsewhere, including volumes.

To troubleshoot this further, we'll need the logs from your Controller and Node services. Could you share those?

connde commented 4 years ago

@timoreimann I have cluster dump, is it enough?

kubernetes-dump.zip

timoreimann commented 4 years ago

That's perfect, thank you @connde. I'll need a bit of time to work through it, will report back once I'm done.

connde commented 4 years ago

It's ok, no problem if it takes some time.

I don't expect that the node will restart, I was just testing what would happen if a node failed.

On Fri, 10 Jul 2020 at 13:39, Timo Reimann notifications@github.com wrote:

That's perfect, thank you @connde https://github.com/connde. I'll need a bit of time to work through it, will report back once I'm done.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/digitalocean/csi-digitalocean/issues/334#issuecomment-656771062, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALRYPZQ45J6D24YPLCQ72TR24733ANCNFSM4OWFMC2Q .

dlebee commented 3 years ago

It's ok, no problem if it takes some time. I don't expect that the node will restart, I was just testing what would happen if a node failed. … On Fri, 10 Jul 2020 at 13:39, Timo Reimann @.***> wrote: That's perfect, thank you @connde https://github.com/connde. I'll need a bit of time to work through it, will report back once I'm done. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#334 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALRYPZQ45J6D24YPLCQ72TR24733ANCNFSM4OWFMC2Q .

The issue also occurs when updating the cluster, it's really frustrating that it takes a super long time for Kubernetes to know that a volume is no longer attached.

As anything changed on this issue?

timoreimann commented 3 years ago

@dlebee do you experience the issue when you upgrade using DOKS or a self-hosted Kubernetes?

dlebee commented 3 years ago

@dlebee do you experience the issue when you upgrade using DOKS or a self-hosted Kubernetes?

DOKS, I have multiple clusters and it always occurs, before the pvc stays attached to a old node and I have to manually go unmount the volume and wait quite some time each upgrade of k8s, that takes the systems down.

timoreimann commented 3 years ago

@dlebee that's definitely not expected. What kind of workload do you use to reference the PVCs? Is it StatefulSets?

Regular Deployments bear the risk of getting to a situation where two replicas are trying to come up, which cannot work when volumes are associated. Just double-checking this isn't the case for you here.

dlebee commented 3 years ago

@dlebee that's definitely not expected. What kind of workload do you use to reference the PVCs? Is it StatefulSets?

Regular Deployments bear the risk of getting to a situation where two replicas are trying to come up, which cannot work when volumes are associated. Just double-checking this isn't the case for you here.

Yes they are statefulset like mongodb/mariadb created by helm charts.

timoreimann commented 3 years ago

@dlebee got it. Is it a particular set of DOKS/Kubernetes versions where you saw this happening, or across the board? How old was the oldest version?

dlebee commented 3 years ago

@dlebee got it. Is it a particular set of DOKS/Kubernetes versions where you saw this happening, or across the board? How old was the oldest version?

Not really, I've had this issue migrating, and via the web you can only go up one version at a time, and it happened every single time I upgraded a version.

I can give you the details of the k8s, maybe upgrading the kubernetes does not update the CSI driver?

davidlebee@Davids-MacBook-Pro ~ % kubectl get CSIDriver -o yaml


apiVersion: v1
items:
- apiVersion: storage.k8s.io/v1
  kind: CSIDriver
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"storage.k8s.io/v1beta1","kind":"CSIDriver","metadata":{"annotations":{},"name":"dobs.csi.digitalocean.com"},"spec":{"attachRequired":true,"podInfoOnMount":true}}
    creationTimestamp: "2021-02-19T16:26:43Z"
    name: dobs.csi.digitalocean.com
    resourceVersion: "316"
    uid: 9754dccf-b1df-4986-ad6c-a63c228261f8
  spec:
    attachRequired: true
    fsGroupPolicy: ReadWriteOnceWithFSType
    podInfoOnMount: true
    volumeLifecycleModes:
    - Persistent
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

timoreimann commented 3 years ago

@dlebee the CSI components should definitely get upgraded as well. It might be more of an issue of how the upgrade proceeds in your case.

I'll run some extra tests. Appreciate any additional details you may be able to provide (either here, via mail, or on the Kubernetes Slack).

dlebee commented 3 years ago

@dlebee the CSI components should definitely get upgraded as well. It might be more of an issue of how the upgrade proceeds in your case.

I'll run some extra tests. Appreciate any additional details you may be able to provide (either here, via mail, or on the Kubernetes Slack).

I have to update a cluster soon, it is currently running 16.16.6-do-2, I’ll let you know how it went.

if you have any questions I’ll be following thread

dlebee commented 3 years ago

@dlebee the CSI components should definitely get upgraded as well. It might be more of an issue of how the upgrade proceeds in your case.

I'll run some extra tests. Appreciate any additional details you may be able to provide (either here, via mail, or on the Kubernetes Slack).

I have upgraded a cluster today and did not have the same issue, is the CSI driver updated automatically on upgrades or is it a manual operation that needs to be done if the cluster is older?

timoreimann commented 3 years ago

@dlebee all components are always upgraded automatically, including the CSI driver. You don't have to upgrade or install any of the managed components yourself.

What's worth pointing out is that older CSI driver and Kubernetes versions still contained certain bugs that got addressed in more recent versions. Chances are you are now past the point where those affect you.

dlebee commented 3 years ago

I’ll upgrade that cluster specifically and let you know as soon as I can if it the issue is still present.

dlebee commented 3 years ago

@timoreimann So the older cluster had pods stuck on terminating and not moving pods stuck in terminating for a long time.

I think the reason is that cluster is actually weaker in resources so it takes longer and I got impatient so I terminated the pods with --force which probably did not alert the CSI driver that the PVC that its no longer bound.

Is there a way to tell k8s faster to release a PVC when a pod is killed by force?

Also another pvc was not reattached during the upgrade that I did not force close, once I released the volume on the website it attached but I had to go release it on the website.

Thank you, David.

timoreimann commented 3 years ago

@dlebee if I had to guess, I wouldn't think that it's a resource problem: bringing pods down should happen fairly quickly. How long was "long time" for you?

If anything, --force should speed up the detachment process: the CSI driver cannot detach volumes if pods using it are still up (including the terminating state). By removing the pod, the CSI-/volume-related controllers should notice that the volume user has gone away and move forward with detaching.

What would be ideal to have if this happens again is all the events that occurred (kubectl -n <involved namespace> get events), the current node state (kubectl get nodes -o yaml), the involved PVCs / PVs (kubectl -n <involved namespace> get pvc -o yaml / kubectl get pv -o yaml), and the current volume attachments (kubectl get volumeattachment -o yaml); all at the time the pods are stuck.

digitalocean / csi-digitalocean