IBM / power-openstack-k8s-volume-driver

power-openstack-k8s-volume-driver
Apache License 2.0
2 stars 13 forks source link

Could not create file system on attached volume. #14

Open eosplane opened 4 years ago

eosplane commented 4 years ago

I am using this driver with OCP4 on Power. Worker node failed to create file system on attached volume.

Here is what I did:

  1. run 'oc apply -f csi_examples/dynamic-pvc.yaml' PowerVC could create an volume with 1G successfully.

  2. run 'oc apply -f csi_examples/dynamic-pod.yaml' The pod was scheduled on worker-1 and the volume was attached to worker-1 successfully. But it failed when creating file system on the attached volume.

'oc describe pod example-pod' showed: ` Events: Type Reason Age From Message

Normal Scheduled 78m default-scheduler Successfully assigned powervccsi/example-pod to worker-1

Normal SuccessfulAttachVolume 77m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-55ab1ee2-e6c0-46f2-8777-240e30993347"

Warning FailedMount 42m (x8 over 73m) kubelet, worker-1 Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[default-token-zsp98 mypvc]: timed out waiting for the condition

Warning FailedMount 5m55s (x23 over 76m) kubelet, worker-1 Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc default-token-zsp98]: timed out waiting for the condition

Warning FailedMount 16s (x26 over 75m) kubelet, worker-1 MountVolume.MountDevice failed for volume "pvc-55ab1ee2-e6c0-46f2-8777-240e30993347" : rpc error: code = InvalidArgument desc = Could not create file system on attached volume directory /dev/sdd. Error is exit status 127

'oc logs -f ibm-powervc-csi-plugin-lnhjn ibm-powervc-csi' shows:

2020/09/04 05:54:02 Running command /usr/bin/sudo [/usr/sbin/udevadm settle] 2020/09/04 05:54:02 Command output /usr/bin/sudo: symbol lookup error: /usr/bin/sudo: undefined symbol: sudo_term_eof 2020/09/04 05:54:02 Error running [/usr/sbin/udevadm settle] I0904 05:54:02.281662 1 nodeserver.go:184] 1 : There was error while at udevd. Error is exit status 127 2020/09/04 05:54:06 Device path is /dev/sdd I0904 05:54:06.282072 1 nodeserver.go:193] 1 : Found directory of attached volume /dev/sdd 2020/09/04 05:54:06 Running command /usr/bin/sudo [/bin/lsblk /dev/sdd --noheadings -o FSTYPE -f] 2020/09/04 05:54:06 Command output /usr/bin/sudo: symbol lookup error: /usr/bin/sudo: undefined symbol: sudo_term_eof 2020/09/04 05:54:06 Error running [/bin/lsblk /dev/sdd --noheadings -o FSTYPE -f] 2020/09/04 05:54:06 /usr/bin/sudo: symbol lookup error: /usr/bin/sudo: undefined symbol: sudo_term_eof 2020/09/04 05:54:06 Running command /usr/bin/sudo [/usr/sbin/mkfs.ext4 /dev/sdd -F] 2020/09/04 05:54:06 Command output /usr/bin/sudo: symbol lookup error: /usr/bin/sudo: undefined symbol: sudo_term_eof E0904 05:54:06.285924 1 utils.go:48] GRPC error: rpc error: code = InvalidArgument desc = Could not create file system on attached volume directory /dev/sdd. Error is exit status 127

Looks like the root cause was undefined symbol: sudo_term_eof while running /usr/bin/sudo /usr/sbin/mkfs.ext4 /dev/sdd -F

But when I ran the above command manually on worker-1, it succeeded.

[core@worker-1 ~]$ /usr/bin/sudo /usr/sbin/mkfs.ext4 /dev/sdd -F mke2fs 1.45.4 (23-Sep-2019) Creating filesystem with 262144 4k blocks and 65536 inodes Filesystem UUID: c8580528-0539-4643-81e8-e3a793a54d51 Superblock backups stored on blocks: 32768, 98304, 163840, 229376

Allocating group tables: done Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done

patpot44 commented 3 years ago

This issue is always there in my configuration with OCP 4.5 and PowerVC 1.4.4.2. In the pod Powervc-csi-plugin dirver, with an oc logs we can see that the command /usr/sbin/mkfs.ext4 /dev/sdd -Fis failed. When I run it manually in the concerned worker node it goes ok. Is something can be done on this one ? is there a bypass ?

jwcroppe commented 3 years ago

@gautpras Any thoughts on this? ^^^

patpot44 commented 3 years ago

Additionnal information: By using filesystem type xfs we've got the same kind of error. oc logs ibm-powervc-csi-plugin-pfqxt ibm-powervc-csi report:

2020/11/20 17:54:50 Running command /usr/bin/sudo [/usr/sbin/mkfs.xfs /dev/dm-2]
2020/11/20 17:54:50 Command output  mkfs.xfs: /dev/dm-2 appears to contain an existing filesystem (xfs).
mkfs.xfs: Use the -f option to force overwrite.

We noticed that when we are in this situation, it is possible to use a bypass: 1-on the pvc/pv concerned, assign persistentVolumeReclaimPolicy to Retain 2-then delete the pod requester of the pvc 3-reposition the persistentVolumeReclaimPolicy to Delete 4-in some cases, the pod automatically continue to run with a new pv

sreenme1 commented 3 years ago

@patpot44

1) Can you please check if the template file is of latest version. Ref: https://github.com/IBM/power-openstack-k8s-volume-driver/blob/master/template/ibm-powervc-csi-driver-template.yaml We had seen a similar issue few months back and had added some changes to the template file.

2) Also, can you check if "disable-rmc-check" is enabled in the env Steps: https://www.ibm.com/support/knowledgecenter/en/SSXK2N_1.4.4/com.ibm.powervc.standard.help.doc/powervc_csi_storage_install.html

cat /etc/nova/nova.conf | grep force_disable
sreenme1 commented 3 years ago

Also, can you try updating client and server timeouts in /etc/haproxy/haproxy.cfg on bastion node. When we increased it from 60 seconds to 4h in one of the env's, the problem went away.

patpot44 commented 3 years ago

@patpot44

1. Can you please check if the template file is of latest version.
   Ref: https://github.com/IBM/power-openstack-k8s-volume-driver/blob/master/template/ibm-powervc-csi-driver-template.yaml
   We had seen a similar issue few months back and had added some changes to the template file.

2. Also, can you check if "disable-rmc-check" is enabled in the env
   Steps: https://www.ibm.com/support/knowledgecenter/en/SSXK2N_1.4.4/com.ibm.powervc.standard.help.doc/powervc_csi_storage_install.html
cat /etc/nova/nova.conf | grep force_disable

For the second point, now we have RSCT available in RHCOS.

patpot44 commented 3 years ago

@patpot44

1. Can you please check if the template file is of latest version.
   Ref: https://github.com/IBM/power-openstack-k8s-volume-driver/blob/master/template/ibm-powervc-csi-driver-template.yaml
   We had seen a similar issue few months back and had added some changes to the template file.

2. Also, can you check if "disable-rmc-check" is enabled in the env
   Steps: https://www.ibm.com/support/knowledgecenter/en/SSXK2N_1.4.4/com.ibm.powervc.standard.help.doc/powervc_csi_storage_install.html
cat /etc/nova/nova.conf | grep force_disable

Thanks, For the second point, now we have RSCT available in RHCOS. I'll try