digitalocean / csi-digitalocean

A Container Storage Interface (CSI) Driver for DigitalOcean Block Storage
Apache License 2.0
577 stars 108 forks source link

Files not persisted/written to block storage volume #41

Closed rubenmch closed 6 years ago

rubenmch commented 6 years ago

I have a kubernetes cluster setup using Rancher 2.0 (https://rancher.com/).

I followed the install instructions and created the Persistent Volume Claim and pod. I can see in DigitalOcean the volume is created and I can access the mounted directory in the busybox container shell and write files to it.

However when the pod is removed and a new pod is created using the same persistent volume claim, the volume is empty and all the previous files are lost.

Also if I delete the pod and instead of creating a new pod I attach the volume to a new droplet directly in digitalocean I see the volume is empty. The files created in the /data directory of the busybox pod are not in the block storage volume.

I tested the same setup in rancher using a different storage driver (rancher/longhorn) and the files were correctly persisted across different pods.

Is there a setting I'm missing to make the files be written to the volume? I followed the instructions step by step copying the yaml as is.

fatih commented 6 years ago

Hi @rubenmch thanks a lot for the feedback. Would you mind share some your example PVC and commands you have used? I'll try to reproduce it, but the best way for me is to get some hands on on the actual data. Thanks!

rubenmch commented 6 years ago

Hi @fatih, I used the examples from the readme:

  1. Secret
apiVersion: v1
kind: Secret
metadata:
  name: digitalocean
  namespace: kube-system
stringData:
  access-token: "a05dd2f26b9b9ac2asdas__REPLACE_ME____123cb5d1ec17513e06da"
  1. Install csi-digitalocean (tried with version 0.1.0 and 0.1.1

kubectl apply -f https://raw.githubusercontent.com/digitalocean/csi-digitalocean/master/deploy/kubernetes/releases/csi-digitalocean-v0.1.1.yaml

  1. Create PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: do-block-storage
  1. Create pod
kind: Pod
apiVersion: v1
metadata:
  name: my-csi-app
spec:
  containers:
    - name: my-frontend
      image: busybox
      volumeMounts:
      - mountPath: "/data"
        name: my-do-volume
      command: [ "sleep", "1000000" ]
  volumes:
    - name: my-do-volume
      persistentVolumeClaim:
        claimName: csi-pvc 
  1. Connect to the pod shell (used the rancher UI with the Execute Shell option on the pod) and then ran the following commands
cd /data
touch hello-world
exit
  1. After that I tried different things starting the process all over with each try (in a new cluster created from scratch each time):

All this was tested in a new rancher 2.0.4 installation with a digitalocean cluster of 3 etcd/control/worker nodes (4GB RAM, 2VCPU) on NYC1.

Are there any logs I can send you that would help?

rubenmch commented 6 years ago

These are the full steps I took including installing rancher:

  1. Created a digitalocean droplet for rancher
  2. Add a rancher.mydomain.com dns entry with the droplet IP
  3. Connected to the droplet using SSH, installed docker and then rancher with the following command:
    docker run -d --restart=unless-stopped --name=rancher -p 80:80 -p 443:443 -v /var/lib/rancher:/var/lib/rancher rancher/rancher:latest --acme-domain rancher.mydomain.com
  4. Used the rancher ui (https://rancher.mydomain.com) to create a new Digitalocean cluster using my digitalocean api key.
  5. Once the cluster was running used the examples from the readme with the Import YAML option available in the Rancher UI inside the Default project of the cluster.
fatih commented 6 years ago

Thanks for the information @rubenmch. I tried it myself and I couldn't reproduce it unfortunately. These are the things I've done:

  1. Create the secret (just like you did above)
  2. Create the PVC (just like you did above)
  3. Create the POD (just like you did above)
  4. Execed into the pod and wrote something to /data/hello-world

After this step, I checked the logs and everything was fine. Now after that I've deleted the pod via:

$ kubectl delete pods/my-csi-app

and then again checked the logs, you should see something like this:

from the node plugin (via kubectl logs -l app=csi-doplugin -c csi-doplugin -n kube-system):

Jul 25 15:15:12 |INFO| node unpublish volume called method=node_unpublish_volume volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f target_path=/var/lib/kubelet/pods/3284f9f8-9018-11e8-9f29-8e6ad6a71316/volumes/kubernetes.io~csi/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/mount
Jul 25 15:15:12 |INFO| unmounting the target path
Jul 25 15:15:12 |INFO| unmounting volume is finished
Jul 25 15:15:12 |INFO| node get capabilities called method=node_get_capabilities node_capabilities=rpc:<type:STAGE_UNSTAGE_VOLUME >
Jul 25 15:15:12 |INFO| node unstage volume called method=node_unstage_volume volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/globalmount
Jul 25 15:15:12 |INFO| unmounting the staging target path
Jul 25 15:15:12 |INFO| unmounting stage volume is finished

You'll see after a delete it successfully unmount the volume first. After that check the logs of the attacher (via kubectl logs -l app=csi-attacher-doplugin -c digitalocean-csi-plugin -n kube-system):

Jul 25 15:15:18 |INFO| controller unpublish volume called method=controller_unpublish_volume
Jul 25 15:15:21 |INFO| waiting until volume is detached
Jul 25 15:15:22 |INFO| action received node_id=637435 action_id=3091929 action_status=completed
Jul 25 15:15:22 |INFO| action completed
Jul 25 15:15:22 |INFO| volume is detached node_id=637436 droplet_id=637436 method=controller_unpublish_volume

You'll see that after the unmounting, the attacher now detaches the volume successfully.

Once I was sure it was unmounted and detached, I also checked the actual volume. It was not deleted and I could see it from the DigitalOcean UI. After that I created the exact same pod again, now you'll see the logs from the opposite. Here is the attacher:

Jul 25 15:16:43 |INFO| controller publish volume called method=controller_publish_volume
Jul 25 15:16:46 |INFO| waiting until volume is attached
Jul 25 15:16:48 |INFO| action received node_id=637435 action_id=3091930 action_status=completed
Jul 25 15:16:48 |INFO| action completed
Jul 25 15:16:48 |INFO| volume is attached node_id=637436 droplet_id=637436 method=controller_publish_volume

You'll see it successfully attaches the volume. And then when we check the node logs:

Jul 25 15:16:51 |INFO| node get capabilities called method=node_get_capabilities node_capabilities=rpc:<type:STAGE_UNSTAGE_VOLUME >
Jul 25 15:16:51 |INFO| node stage volume called
Jul 25 15:16:51 |INFO| source device is already formatted fsType=ext4 mount_options=[] method=node_stage_volume volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f volume_name=pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/globalmount
Jul 25 15:16:51 |INFO| mounting the volume for staging
Jul 25 15:16:51 |INFO| formatting and mounting stage volume is finished
Jul 25 15:16:51 |INFO| node get capabilities called method=node_get_capabilities node_capabilities=rpc:<type:STAGE_UNSTAGE_VOLUME >
Jul 25 15:16:51 |INFO| node publish volume called
Jul 25 15:16:51 |INFO| mounting the volume fsType=ext4 mount_options=[bind] method=node_publish_volume volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f source=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/globalmount target=/var/lib/kubelet/pods/bcff7782-901d-11e8-9f29-8e6ad6a71316/volumes/kubernetes.io~csi/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/mount
Jul 25 15:16:51 |INFO| bind mounting the volume is finished

You'll see, because it was already formatted, it doesn't format it anymore, instead it does the usual mounting procedure.

I execed again into the pod and I could see that the data I wrote is still there:

$  kubectl exec -ti my-csi-app /bin/sh
/ # cat /data/hello-world
foo
bar

Can you please just delete the pod, wait a little bit and create the pod again? Please don't use rancher for now, let's try to limit the scope of the issue. As a reminder here are the logs commands that could be useful to see:

  kubectl logs -l app=csi-doplugin -c csi-doplugin -n kube-system
  kubectl logs -l app=csi-attacher-doplugin -c digitalocean-csi-plugin -n kube-system
  kubectl logs -l app=csi-provisioner-doplugin -c digitalocean-csi-plugin -n kube-system

Can you please do the exact same thing, so in order:

1. Create the pod, write something to the /data folder
2. Delete the pod
3. Execute the above logs to see what's going one
4. Create the pod again, execute the logs command to see what's going on and exec into the pod and output the content in /data
rubenmch commented 6 years ago

@fatih Thanks for the detailed instructions.

I did all the steps one by one, all using kubectl instead of the rancher ui, the logs seem similar to your logs but the directory is again empty after creating the new pod. Rancher was only used to create the cluster.

In the output of ls -la I see the time for the . and .. directories in the /data directory is different between the first (17:55) and second pod (18:02), is that expected?

According to the logs the digitalocean block storage volume id or name didn't change between the first and second pod.

Here are the commands I ran on a new kubernetes cluster, used the yaml from the previous post.

PS C:\Users\rmontelongo\.kube\yaml> kubectl create -f secret.yaml
secret "digitalocean" created
PS C:\Users\rmontelongo\.kube\yaml> kubectl apply -f https://raw.githubusercontent.com/digitalocean/csi-digitalocean/master/deploy/kubernetes/releases/csi-digitalocean-v0.1.1.yaml
storageclass "do-block-storage" created
serviceaccount "csi-attacher" created
clusterrole "external-attacher-runner" created
clusterrolebinding "csi-attacher-role" created
service "csi-attacher-doplugin" created
statefulset "csi-attacher-doplugin" created
serviceaccount "csi-provisioner" created
clusterrole "external-provisioner-runner" created
clusterrolebinding "csi-provisioner-role" created
service "csi-provisioner-doplugin" created
statefulset "csi-provisioner-doplugin" created
serviceaccount "csi-doplugin" created
clusterrole "csi-doplugin" created
clusterrolebinding "csi-doplugin" created
daemonset "csi-doplugin" created
PS C:\Users\rmontelongo\.kube\yaml> kubectl create -f pvc.yaml
persistentvolumeclaim "csi-pvc" created
PS C:\Users\rmontelongo\.kube\yaml> kubectl create -f pod.yaml
pod "my-csi-app" created
PS C:\Users\rmontelongo\.kube\yaml> kubectl exec -it my-csi-app -- /bin/ash
/ # cd /data
/data # echo 'hello' > helloworld
/data # ls
helloworld
/data # ls -la
total 12
drwxr-x---    2 root     root          4096 Jul 25 17:55 .
drwxr-xr-x   21 root     root          4096 Jul 25 17:54 ..
-rw-r--r--    1 root     root             6 Jul 25 17:55 helloworld
/data # exit

Plugin logs after deleting the pod:

time="2018-07-25T17:51:12Z" level=info msg="removing socket" node_id=103332682 region=nyc1 socket=/csi/csi.sock
time="2018-07-25T17:51:12Z" level=info msg="server started" addr=/csi/csi.sock node_id=103332682 region=nyc1
time="2018-07-25T17:51:12Z" level=info msg="get plugin info called" method=get_plugin_info node_id=103332682 region=nyc1 response="name:\"com.digitalocean.csi.dobs\" vendor_version:\"0.1.1\" "
time="2018-07-25T17:51:12Z" level=info msg="node get id called" method=node_get_id node_id=103332682 region=nyc1
time="2018-07-25T17:51:08Z" level=info msg="removing socket" node_id=103332678 region=nyc1 socket=/csi/csi.sock
time="2018-07-25T17:51:08Z" level=info msg="server started" addr=/csi/csi.sock node_id=103332678 region=nyc1
time="2018-07-25T17:51:09Z" level=info msg="get plugin info called" method=get_plugin_info node_id=103332678 region=nyc1 response="name:\"com.digitalocean.csi.dobs\" vendor_version:\"0.1.1\" "
time="2018-07-25T17:51:09Z" level=info msg="node get id called" method=node_get_id node_id=103332678 region=nyc1
time="2018-07-25T17:52:28Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=103332678 region=nyc1
time="2018-07-25T17:52:28Z" level=info msg="node stage volume called" node_id=103332678 region=nyc1
time="2018-07-25T17:52:28Z" level=info msg="formatting the volume for staging" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103332678 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-64c8224c-9033-11e8-9c45-aadb180dee3a staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f volume_name=pvc-64c8224c-9033-11e8-9c45-aadb180dee3a
time="2018-07-25T17:52:29Z" level=info msg="mounting the volume for staging" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103332678 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-64c8224c-9033-11e8-9c45-aadb180dee3a staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f volume_name=pvc-64c8224c-9033-11e8-9c45-aadb180dee3a
time="2018-07-25T17:52:29Z" level=info msg="formatting and mounting stage volume is finished" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103332678 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-64c8224c-9033-11e8-9c45-aadb180dee3a staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f volume_name=pvc-64c8224c-9033-11e8-9c45-aadb180dee3a
time="2018-07-25T17:52:29Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=103332678 region=nyc1
time="2018-07-25T17:52:29Z" level=info msg="node publish volume called" node_id=103332678 region=nyc1
time="2018-07-25T17:52:29Z" level=info msg="mounting the volume" fsType=ext4 method=node_publish_volume mount_options="[bind]" node_id=103332678 region=nyc1 source=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount target="/var/lib/kubelet/pods/7a1d1683-9033-11e8-9c45-aadb180dee3a/volumes/kubernetes.io~csi/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/mount" volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:52:29Z" level=info msg="bind mounting the volume is finished" fsType=ext4 method=node_publish_volume mount_options="[bind]" node_id=103332678 region=nyc1 source=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount target="/var/lib/kubelet/pods/7a1d1683-9033-11e8-9c45-aadb180dee3a/volumes/kubernetes.io~csi/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/mount" volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:56:22Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=103332678 region=nyc1
time="2018-07-25T17:56:22Z" level=info msg="node unstage volume called" method=node_unstage_volume node_id=103332678 region=nyc1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:56:22Z" level=info msg="unmounting the staging target path" method=node_unstage_volume node_id=103332678 region=nyc1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:56:22Z" level=info msg="unmounting stage volume is finished" method=node_unstage_volume node_id=103332678 region=nyc1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f

Attacher logs after deleting the pod

time="2018-07-25T17:51:08Z" level=info msg="removing socket" node_id=103332682 region=nyc1 socket=/var/lib/csi/sockets/pluginproxy/csi.sock
time="2018-07-25T17:51:08Z" level=info msg="server started" addr=/var/lib/csi/sockets/pluginproxy/csi.sock node_id=103332682 region=nyc1
time="2018-07-25T17:51:09Z" level=info msg="get plugin info called" method=get_plugin_info node_id=103332682 region=nyc1 response="name:\"com.digitalocean.csi.dobs\" vendor_version:\"0.1.1\" "
time="2018-07-25T17:51:09Z" level=info msg="probe called" method=prove node_id=103332682 region=nyc1
time="2018-07-25T17:51:09Z" level=info msg="get plugin capabitilies called" method=get_plugin_capabilities node_id=103332682 region=nyc1 response="capabilities:<service:<type:CONTROLLER_SERVICE > > "
time="2018-07-25T17:51:09Z" level=info msg="controller get capabilities called" method=controller_get_capabilities node_id=103332682 region=nyc1 response="capabilities:<rpc:<type:CREATE_DELETE_VOLUME > > capabilities:<rpc:<type:PUBLISH_UNPUBLISH_VOLUME > > capabilities:<rpc:<type:LIST_VOLUMES > > "
time="2018-07-25T17:52:20Z" level=info msg="controller publish volume called" droplet_id=103332678 method=controller_publish_volume node_id=103332678 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:52:21Z" level=info msg="waiting until volume is attached" droplet_id=103332678 method=controller_publish_volume node_id=103332678 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:52:23Z" level=info msg="action received" action_id=479209041 action_status=completed node_id=103332682 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:52:23Z" level=info msg="action completed" action_id=479209041 node_id=103332682 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:52:23Z" level=info msg="volume is attached" droplet_id=103332678 method=controller_publish_volume node_id=103332678 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:52:23Z" level=info msg="controller publish volume called" droplet_id=103332678 method=controller_publish_volume node_id=103332678 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:56:27Z" level=info msg="controller unpublish volume called" droplet_id=103332678 method=controller_unpublish_volume node_id=103332678 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:56:27Z" level=info msg="waiting until volume is detached" droplet_id=103332678 method=controller_unpublish_volume node_id=103332678 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:56:29Z" level=info msg="action received" action_id=479210646 action_status=completed node_id=103332682 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:56:29Z" level=info msg="action completed" action_id=479210646 node_id=103332682 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:56:29Z" level=info msg="volume is detached" droplet_id=103332678 method=controller_unpublish_volume node_id=103332678 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f

After that I created the pod again with:

PS C:\Users\rmontelongo\.kube\yaml> kubectl create -f pod.yaml
pod "my-csi-app" created

Attacher logs after creating the pod again:

time="2018-07-25T18:02:50Z" level=info msg="controller publish volume called" droplet_id=103332678 method=controller_publish_volume node_id=103332678 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T18:02:51Z" level=info msg="waiting until volume is attached" droplet_id=103332678 method=controller_publish_volume node_id=103332678 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T18:02:52Z" level=info msg="action received" action_id=479214007 action_status=completed node_id=103332682 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T18:02:52Z" level=info msg="action completed" action_id=479214007 node_id=103332682 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T18:02:52Z" level=info msg="volume is attached" droplet_id=103332678 method=controller_publish_volume node_id=103332678 region=nyc1 volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f

Plugin logs after creating the pod again:

time="2018-07-25T18:02:58Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=103332678 region=nyc1
time="2018-07-25T18:02:58Z" level=info msg="node stage volume called" node_id=103332678 region=nyc1
time="2018-07-25T18:02:58Z" level=info msg="source device is already formatted" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103332678 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-64c8224c-9033-11e8-9c45-aadb180dee3a staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f volume_name=pvc-64c8224c-9033-11e8-9c45-aadb180dee3a
time="2018-07-25T18:02:58Z" level=info msg="mounting the volume for staging" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103332678 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-64c8224c-9033-11e8-9c45-aadb180dee3a staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f volume_name=pvc-64c8224c-9033-11e8-9c45-aadb180dee3a
time="2018-07-25T18:02:58Z" level=info msg="formatting and mounting stage volume is finished" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103332678 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-64c8224c-9033-11e8-9c45-aadb180dee3a staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f volume_name=pvc-64c8224c-9033-11e8-9c45-aadb180dee3a
time="2018-07-25T18:02:58Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=103332678 region=nyc1
time="2018-07-25T18:02:58Z" level=info msg="node publish volume called" node_id=103332678 region=nyc1
time="2018-07-25T18:02:58Z" level=info msg="mounting the volume" fsType=ext4 method=node_publish_volume mount_options="[bind]" node_id=103332678 region=nyc1 source=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount target="/var/lib/kubelet/pods/f1f8c88f-9034-11e8-9c45-aadb180dee3a/volumes/kubernetes.io~csi/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/mount" volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T18:02:58Z" level=info msg="bind mounting the volume is finished" fsType=ext4 method=node_publish_volume mount_options="[bind]" node_id=103332678 region=nyc1 source=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount target="/var/lib/kubelet/pods/f1f8c88f-9034-11e8-9c45-aadb180dee3a/volumes/kubernetes.io~csi/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/mount" volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f

/data directory contents in the new pod:

PS C:\Users\rmontelongo\.kube\yaml> kubectl exec -it my-csi-app -- /bin/ash
/ # cd /data
/data # ls
/data # ls -la
total 8
drwxr-x---    2 root     root          4096 Jul 25 18:02 .
drwxr-xr-x   21 root     root          4096 Jul 25 18:04 ..
/data #

Is there anything else I can check?

Once again thanks for your help.

fatih commented 6 years ago

@rubenmch is is possible to get the logs from the provisioner as well? Here is the command for it:

  kubectl logs -l app=csi-provisioner-doplugin -c digitalocean-csi-plugin -n kube-system

The provisioner is responsible of creating and deleting volumes, whereas the attacher is responsible of attaching and detaching. One of my coworkers indicated this could be a problem of shadowing, but I yet didn't find a root cause. Still investigating.

fatih commented 6 years ago

Plugin logs after deleting the pod:

Something is wrong here, I see that the node plugin doesn't unmounts the target path (from your logs), I only see these:

time="2018-07-25T17:56:22Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=103332678 region=nyc1
time="2018-07-25T17:56:22Z" level=info msg="node unstage volume called" method=node_unstage_volume node_id=103332678 region=nyc1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:56:22Z" level=info msg="unmounting the staging target path" method=node_unstage_volume node_id=103332678 region=nyc1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f
time="2018-07-25T17:56:22Z" level=info msg="unmounting stage volume is finished" method=node_unstage_volume node_id=103332678 region=nyc1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-64c8224c-9033-11e8-9c45-aadb180dee3a/globalmount volume_id=6c5f440f-9033-11e8-a4f6-0242ac11380f

Note that all these start at 17:56:22. Any log before these are the logs that were created during the creation of the pod. Here is the logs from my cluster when delete a pod:

time="2018-07-26T09:44:05Z" level=info msg="node unpublish volume called" method=node_unpublish_volume node_id=637436 region=s2r1 target_path="/var/lib/kubelet/pods/bcff7782-901d-11e8-9f29-8e6ad6a71316/volumes/kubernetes.io~csi/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/mount" volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f
time="2018-07-26T09:44:05Z" level=info msg="unmounting the target path" method=node_unpublish_volume node_id=637436 region=s2r1 target_path="/var/lib/kubelet/pods/bcff7782-901d-11e8-9f29-8e6ad6a71316/volumes/kubernetes.io~csi/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/mount" volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f
time="2018-07-26T09:44:05Z" level=info msg="unmounting volume is finished" method=node_unpublish_volume node_id=637436 region=s2r1 target_path="/var/lib/kubelet/pods/bcff7782-901d-11e8-9f29-8e6ad6a71316/volumes/kubernetes.io~csi/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/mount" volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f
time="2018-07-26T09:44:05Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=637436 region=s2r1
time="2018-07-26T09:44:05Z" level=info msg="node unstage volume called" method=node_unstage_volume node_id=637436 region=s2r1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/globalmount volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f
time="2018-07-26T09:44:05Z" level=info msg="unmounting the staging target path" method=node_unstage_volume node_id=637436 region=s2r1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/globalmount volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f
time="2018-07-26T09:44:05Z" level=info msg="unmounting stage volume is finished" method=node_unstage_volume node_id=637436 region=s2r1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/globalmount volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f

If you look carefully, there additional logs line are present which are absent in your logs:

time="2018-07-26T09:44:05Z" level=info msg="node unpublish volume called" method=node_unpublish_volume node_id=637436 region=s2r1 target_path="/var/lib/kubelet/pods/bcff7782-901d-11e8-9f29-8e6ad6a71316/volumes/kubernetes.io~csi/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/mount" volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f
time="2018-07-26T09:44:05Z" level=info msg="unmounting the target path" method=node_unpublish_volume node_id=637436 region=s2r1 target_path="/var/lib/kubelet/pods/bcff7782-901d-11e8-9f29-8e6ad6a71316/volumes/kubernetes.io~csi/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/mount" volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f
time="2018-07-26T09:44:05Z" level=info msg="unmounting volume is finished" method=node_unpublish_volume node_id=637436 region=s2r1 target_path="/var/lib/kubelet/pods/bcff7782-901d-11e8-9f29-8e6ad6a71316/volumes/kubernetes.io~csi/pvc-35d362a0-9017-11e8-9f29-8e6ad6a71316/mount" volume_id=3ec0973e-9017-11e8-8a15-0242ac11212f

This is the part where it unmounts the volume from the target path, so basically. All these are called inside the node_unpublish_volume method.

Could it be possible you can create & delete a pod again and check the node plugin logs again? Do you see only 4 new log lines, or 7 new log lines which also include the unpublish_volume method?

I continue to look more why this is the case for you. Maybe it's a kubernetes version issue where it doesn't call unpublish_volume.

fatih commented 6 years ago

@rubenmch could you also share your Kubernetes version? you can get it via kubectl version

fatih commented 6 years ago

@rubenmch is it possible to share your kubelet logs as well during this process. You can upload them to a gist if it's too long.

rubenmch commented 6 years ago

@fatih Here is the kubernetes version:

Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.2", GitCommit:"5fa2db2bd46ac79e5e00a4e6ed24191080aa463b", GitTreeState:"clean", BuildDate:"2018-01-18T10:09:24Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"windows/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

I'll get the other logs and add them.

rubenmch commented 6 years ago

@fatih Here are the logs after recreating everything in a new cluster. Only 4 lines are added to the plugin log after deleting the pod.

plugin log

time="2018-07-26T14:14:56Z" level=info msg="removing socket" node_id=103450863 region=nyc1 socket=/csi/csi.sock
time="2018-07-26T14:14:56Z" level=info msg="server started" addr=/csi/csi.sock node_id=103450863 region=nyc1
time="2018-07-26T14:14:56Z" level=info msg="get plugin info called" method=get_plugin_info node_id=103450863 region=nyc1 response="name:\"com.digitalocean.csi.dobs\" vendor_version:\"0.1.1\" "
time="2018-07-26T14:14:56Z" level=info msg="node get id called" method=node_get_id node_id=103450863 region=nyc1
time="2018-07-26T14:15:36Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=103450863 region=nyc1
time="2018-07-26T14:15:36Z" level=info msg="node stage volume called" node_id=103450863 region=nyc1
time="2018-07-26T14:15:37Z" level=info msg="formatting the volume for staging" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103450863 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505 volume_name=pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85
time="2018-07-26T14:15:37Z" level=info msg="mounting the volume for staging" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103450863 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505 volume_name=pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85
time="2018-07-26T14:15:37Z" level=info msg="formatting and mounting stage volume is finished" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103450863 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505 volume_name=pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85
time="2018-07-26T14:15:37Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=103450863 region=nyc1
time="2018-07-26T14:15:37Z" level=info msg="node publish volume called" node_id=103450863 region=nyc1
time="2018-07-26T14:15:37Z" level=info msg="mounting the volume" fsType=ext4 method=node_publish_volume mount_options="[bind]" node_id=103450863 region=nyc1 source=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount target="/var/lib/kubelet/pods/59460d30-90de-11e8-b4ef-becb0cc56e85/volumes/kubernetes.io~csi/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/mount" volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:15:37Z" level=info msg="bind mounting the volume is finished" fsType=ext4 method=node_publish_volume mount_options="[bind]" node_id=103450863 region=nyc1 source=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount target="/var/lib/kubelet/pods/59460d30-90de-11e8-b4ef-becb0cc56e85/volumes/kubernetes.io~csi/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/mount" volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:18:54Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=103450863 region=nyc1
time="2018-07-26T14:18:54Z" level=info msg="node unstage volume called" method=node_unstage_volume node_id=103450863 region=nyc1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:18:54Z" level=info msg="unmounting the staging target path" method=node_unstage_volume node_id=103450863 region=nyc1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:18:54Z" level=info msg="unmounting stage volume is finished" method=node_unstage_volume node_id=103450863 region=nyc1 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:28:11Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=103450863 region=nyc1
time="2018-07-26T14:28:11Z" level=info msg="node stage volume called" node_id=103450863 region=nyc1
time="2018-07-26T14:28:12Z" level=info msg="source device is already formatted" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103450863 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505 volume_name=pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85
time="2018-07-26T14:28:12Z" level=info msg="mounting the volume for staging" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103450863 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505 volume_name=pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85
time="2018-07-26T14:28:12Z" level=info msg="formatting and mounting stage volume is finished" fsType=ext4 method=node_stage_volume mount_options="[]" node_id=103450863 region=nyc1 source=/dev/disk/by-id/scsi-0DO_Volume_pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85 staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505 volume_name=pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85
time="2018-07-26T14:28:12Z" level=info msg="node get capabilities called" method=node_get_capabilities node_capabilities="rpc:<type:STAGE_UNSTAGE_VOLUME > " node_id=103450863 region=nyc1
time="2018-07-26T14:28:12Z" level=info msg="node publish volume called" node_id=103450863 region=nyc1
time="2018-07-26T14:28:12Z" level=info msg="mounting the volume" fsType=ext4 method=node_publish_volume mount_options="[bind]" node_id=103450863 region=nyc1 source=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount target="/var/lib/kubelet/pods/1b501ca6-90e0-11e8-b4ef-becb0cc56e85/volumes/kubernetes.io~csi/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/mount" volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:28:12Z" level=info msg="bind mounting the volume is finished" fsType=ext4 method=node_publish_volume mount_options="[bind]" node_id=103450863 region=nyc1 source=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/globalmount target="/var/lib/kubelet/pods/1b501ca6-90e0-11e8-b4ef-becb0cc56e85/volumes/kubernetes.io~csi/pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85/mount" volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505

attacher log

time="2018-07-26T14:14:55Z" level=info msg="removing socket" node_id=103450863 region=nyc1 socket=/var/lib/csi/sockets/pluginproxy/csi.sock
time="2018-07-26T14:14:55Z" level=info msg="server started" addr=/var/lib/csi/sockets/pluginproxy/csi.sock node_id=103450863 region=nyc1
time="2018-07-26T14:14:55Z" level=info msg="get plugin info called" method=get_plugin_info node_id=103450863 region=nyc1 response="name:\"com.digitalocean.csi.dobs\" vendor_version:\"0.1.1\" "
time="2018-07-26T14:14:55Z" level=info msg="probe called" method=prove node_id=103450863 region=nyc1
time="2018-07-26T14:14:55Z" level=info msg="get plugin capabitilies called" method=get_plugin_capabilities node_id=103450863 region=nyc1 response="capabilities:<service:<type:CONTROLLER_SERVICE > > "
time="2018-07-26T14:14:55Z" level=info msg="controller get capabilities called" method=controller_get_capabilities node_id=103450863 region=nyc1 response="capabilities:<rpc:<type:CREATE_DELETE_VOLUME > > capabilities:<rpc:<type:PUBLISH_UNPUBLISH_VOLUME > > capabilities:<rpc:<type:LIST_VOLUMES > > "
time="2018-07-26T14:15:29Z" level=info msg="controller publish volume called" droplet_id=103450863 method=controller_publish_volume node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:15:29Z" level=info msg="waiting until volume is attached" droplet_id=103450863 method=controller_publish_volume node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:15:31Z" level=info msg="action received" action_id=479753939 action_status=completed node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:15:31Z" level=info msg="action completed" action_id=479753939 node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:15:31Z" level=info msg="volume is attached" droplet_id=103450863 method=controller_publish_volume node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:18:56Z" level=info msg="controller unpublish volume called" droplet_id=103450863 method=controller_unpublish_volume node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:18:57Z" level=info msg="waiting until volume is detached" droplet_id=103450863 method=controller_unpublish_volume node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:18:58Z" level=info msg="action received" action_id=479755778 action_status=completed node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:18:58Z" level=info msg="action completed" action_id=479755778 node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:18:58Z" level=info msg="volume is detached" droplet_id=103450863 method=controller_unpublish_volume node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:28:04Z" level=info msg="controller publish volume called" droplet_id=103450863 method=controller_publish_volume node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:28:04Z" level=info msg="waiting until volume is attached" droplet_id=103450863 method=controller_publish_volume node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:28:06Z" level=info msg="action received" action_id=479760402 action_status=completed node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:28:06Z" level=info msg="action completed" action_id=479760402 node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505
time="2018-07-26T14:28:06Z" level=info msg="volume is attached" droplet_id=103450863 method=controller_publish_volume node_id=103450863 region=nyc1 volume_id=51dd5d4f-90de-11e8-bf8e-0242ac114505

provisioner log

time="2018-07-26T14:14:55Z" level=info msg="removing socket" node_id=103450863 region=nyc1 socket=/var/lib/csi/sockets/pluginproxy/csi.sock
time="2018-07-26T14:14:55Z" level=info msg="server started" addr=/var/lib/csi/sockets/pluginproxy/csi.sock node_id=103450863 region=nyc1
time="2018-07-26T14:15:16Z" level=info msg="get plugin capabitilies called" method=get_plugin_capabilities node_id=103450863 region=nyc1 response="capabilities:<service:<type:CONTROLLER_SERVICE > > "
time="2018-07-26T14:15:16Z" level=info msg="controller get capabilities called" method=controller_get_capabilities node_id=103450863 region=nyc1 response="capabilities:<rpc:<type:CREATE_DELETE_VOLUME > > capabilities:<rpc:<type:PUBLISH_UNPUBLISH_VOLUME > > capabilities:<rpc:<type:LIST_VOLUMES > > "
time="2018-07-26T14:15:16Z" level=info msg="get plugin info called" method=get_plugin_info node_id=103450863 region=nyc1 response="name:\"com.digitalocean.csi.dobs\" vendor_version:\"0.1.1\" "
time="2018-07-26T14:15:16Z" level=info msg="create volume called" method=create_volume node_id=103450863 region=nyc1 storage_size_giga_bytes=5 volume_name=pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85
time="2018-07-26T14:15:16Z" level=info msg="creating volume" method=create_volume node_id=103450863 region=nyc1 storage_size_giga_bytes=5 volume_name=pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85 volume_req="&{nyc1 pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85 Created by DigitalOcean CSI driver 5 }"
time="2018-07-26T14:15:16Z" level=info msg="volume created" method=create_volume node_id=103450863 region=nyc1 response="volume:<capacity_bytes:5368709120 id:\"51dd5d4f-90de-11e8-bf8e-0242ac114505\" > " storage_size_giga_bytes=5 volume_name=pvc-4ff20e4f-90de-11e8-b4ef-becb0cc56e85
rubenmch commented 6 years ago

@fatih Here are the kubelet logs, the node was restarted at 14:38 to reset the root password so relevant logs are before 14:38. Around 14:14 first pod created Around 14:18 first pod deleted Around 14:28 second pod created

https://gist.github.com/rubenmch/8fdabef2ace03ebc6fbed9f77a5edb9b

rubenmch commented 6 years ago

@fatih I created a new kubernetes cluster using Kontena Pharos and the csi-digitalocean driver worked as expected. The working pharos created cluster kubernetes version is v1.10.5.

In the Rancher FAQ they mention they provide their own kubernetes distribution so maybe it's an issue specific to the kubernetes distribution they use.

https://rancher.com/docs/rancher/v2.x/en/faq/

adi961 commented 6 years ago

Hi, I have the same problem. I use rancher 2.0.6 and kubernetes 1.10.5. I also tried creating the pvc manually and deploying the test pod. No luck. Is there by any chance a different solution for pvc on digitalocean which I could use temporarily because I really need to get my cluster up and running. Best Adrian

praveenperera commented 6 years ago

I'm also having this problem with Rancher 2.0.6 and 2.0.4. I also tried creating the workloads entirely using kubectl, same thing. Also a note I have another cluster on AWS using EBS volumes which works fine, I noticed that postgres container upgrades are much faster on AWS compared to DO with this driver. Creating the new container takes a while on DO and when it is created all data is lost.

I don't know if that gives you a possible hint or anything. Let me know if I can do something to help debug. I've been wanting to move my cluster and spin on new ones in DO instead of AWS. This is the last thing standing in my way.

fatih commented 6 years ago

Thanks @adi961 @praveenperera I'll try to setup a cluster via Rancher and do my testing there. Unfortunately I was not able to reproduce the issue with our own upcoming Kubernetes at DigitalOcean product. I'm on it, meanwhile, anything suspicious, logs (see above commands), etc.. would be helpful nevertheless.

fatih commented 6 years ago

@adi961 @rubenmch @praveenperera Thank's all for the comments and feedback. After hours of debugging I finally found the issue and could reproduce it. I also found the underlying issue and a temporary fix for you folks:

TL;DR: mount propagation is disabled by default in Rancher

Long story:

Mount Propagation needs to be enabled for CSI plugins so the plugin can mount and bind mount the block storage to the pod from the host. Because mount propagation is not enabled, anything you write to the the pod's FS will not propagated to the mounted block device, aka DO block storage volume.

The issue here is, Rancher by default disables explicitly mount propagation, see here: https://github.com/rancher/rke/issues/765#issuecomment-404582919

To enable it back, you need to add the following lines to the config.yaml (Cluster Options > Edit as Yaml) file. I've tested it on an already provisioned Cluster and Rancher was able to update the cluster with this new setting without issues:

services:
  kube-api:
    extra_args:
      feature-gates: MountPropagation=true

  kubelet:
    extra_args:
      feature-gates: MountPropagation=true

I've also asked what the latest, recommended Rancher version is that fixed this issue: https://github.com/rancher/rke/issues/765#issuecomment-409628294

So for now, just add the above settings and you're good to go. I'll update our README.md to make sure Rancher users know that there is an existing issue with latest Rancher v2.0.6 version

Feel free to reopen a new issue if you find something different, but for now there is not much we can do. However, I'll plan to add a check on our side to see if the mounts are propagated or not. At least the CSI plugin would be able to return an error instead of silently failing.

beshkenadze commented 5 years ago

@fatih for version v2.2.0 we still get an error AttachVolume.Attach failed for volume "pvc-{ID}" : node "{NODE_NAME}" has no NodeID annotation even when adding MountPropagation=true any suggestion?