Open eosplane opened 4 years ago
This issue is always there in my configuration with OCP 4.5 and PowerVC 1.4.4.2.
In the pod Powervc-csi-plugin dirver, with an oc logs
we can see that the command /usr/sbin/mkfs.ext4 /dev/sdd -F
is failed.
When I run it manually in the concerned worker node it goes ok.
Is something can be done on this one ? is there a bypass ?
@gautpras Any thoughts on this? ^^^
Additionnal information:
By using filesystem type xfs
we've got the same kind of error.
oc logs ibm-powervc-csi-plugin-pfqxt ibm-powervc-csi report:
2020/11/20 17:54:50 Running command /usr/bin/sudo [/usr/sbin/mkfs.xfs /dev/dm-2]
2020/11/20 17:54:50 Command output mkfs.xfs: /dev/dm-2 appears to contain an existing filesystem (xfs).
mkfs.xfs: Use the -f option to force overwrite.
We noticed that when we are in this situation, it is possible to use a bypass:
1-on the pvc/pv concerned, assign persistentVolumeReclaimPolicy
to Retain
2-then delete the pod requester of the pvc
3-reposition the persistentVolumeReclaimPolicy
to Delete
4-in some cases, the pod automatically continue to run with a new pv
@patpot44
1) Can you please check if the template file is of latest version. Ref: https://github.com/IBM/power-openstack-k8s-volume-driver/blob/master/template/ibm-powervc-csi-driver-template.yaml We had seen a similar issue few months back and had added some changes to the template file.
2) Also, can you check if "disable-rmc-check" is enabled in the env Steps: https://www.ibm.com/support/knowledgecenter/en/SSXK2N_1.4.4/com.ibm.powervc.standard.help.doc/powervc_csi_storage_install.html
cat /etc/nova/nova.conf | grep force_disable
Also, can you try updating client and server timeouts in /etc/haproxy/haproxy.cfg on bastion node. When we increased it from 60 seconds to 4h in one of the env's, the problem went away.
@patpot44
1. Can you please check if the template file is of latest version. Ref: https://github.com/IBM/power-openstack-k8s-volume-driver/blob/master/template/ibm-powervc-csi-driver-template.yaml We had seen a similar issue few months back and had added some changes to the template file. 2. Also, can you check if "disable-rmc-check" is enabled in the env Steps: https://www.ibm.com/support/knowledgecenter/en/SSXK2N_1.4.4/com.ibm.powervc.standard.help.doc/powervc_csi_storage_install.html
cat /etc/nova/nova.conf | grep force_disable
For the second point, now we have RSCT available in RHCOS.
@patpot44
1. Can you please check if the template file is of latest version. Ref: https://github.com/IBM/power-openstack-k8s-volume-driver/blob/master/template/ibm-powervc-csi-driver-template.yaml We had seen a similar issue few months back and had added some changes to the template file. 2. Also, can you check if "disable-rmc-check" is enabled in the env Steps: https://www.ibm.com/support/knowledgecenter/en/SSXK2N_1.4.4/com.ibm.powervc.standard.help.doc/powervc_csi_storage_install.html
cat /etc/nova/nova.conf | grep force_disable
Thanks, For the second point, now we have RSCT available in RHCOS. I'll try
I am using this driver with OCP4 on Power. Worker node failed to create file system on attached volume.
Here is what I did:
run 'oc apply -f csi_examples/dynamic-pvc.yaml' PowerVC could create an volume with 1G successfully.
run 'oc apply -f csi_examples/dynamic-pod.yaml' The pod was scheduled on worker-1 and the volume was attached to worker-1 successfully. But it failed when creating file system on the attached volume.
'oc describe pod example-pod' showed: ` Events: Type Reason Age From Message
Normal Scheduled 78m default-scheduler Successfully assigned powervccsi/example-pod to worker-1
Normal SuccessfulAttachVolume 77m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-55ab1ee2-e6c0-46f2-8777-240e30993347"
Warning FailedMount 42m (x8 over 73m) kubelet, worker-1 Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[default-token-zsp98 mypvc]: timed out waiting for the condition
Warning FailedMount 5m55s (x23 over 76m) kubelet, worker-1 Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc default-token-zsp98]: timed out waiting for the condition
Warning FailedMount 16s (x26 over 75m) kubelet, worker-1 MountVolume.MountDevice failed for volume "pvc-55ab1ee2-e6c0-46f2-8777-240e30993347" : rpc error: code = InvalidArgument desc = Could not create file system on attached volume directory /dev/sdd. Error is exit status 127
'oc logs -f ibm-powervc-csi-plugin-lnhjn ibm-powervc-csi' shows:
2020/09/04 05:54:02 Running command /usr/bin/sudo [/usr/sbin/udevadm settle] 2020/09/04 05:54:02 Command output /usr/bin/sudo: symbol lookup error: /usr/bin/sudo: undefined symbol: sudo_term_eof 2020/09/04 05:54:02 Error running [/usr/sbin/udevadm settle] I0904 05:54:02.281662 1 nodeserver.go:184] 1 : There was error while at udevd. Error is exit status 127 2020/09/04 05:54:06 Device path is /dev/sdd I0904 05:54:06.282072 1 nodeserver.go:193] 1 : Found directory of attached volume /dev/sdd 2020/09/04 05:54:06 Running command /usr/bin/sudo [/bin/lsblk /dev/sdd --noheadings -o FSTYPE -f] 2020/09/04 05:54:06 Command output /usr/bin/sudo: symbol lookup error: /usr/bin/sudo: undefined symbol: sudo_term_eof 2020/09/04 05:54:06 Error running [/bin/lsblk /dev/sdd --noheadings -o FSTYPE -f] 2020/09/04 05:54:06 /usr/bin/sudo: symbol lookup error: /usr/bin/sudo: undefined symbol: sudo_term_eof 2020/09/04 05:54:06 Running command /usr/bin/sudo [/usr/sbin/mkfs.ext4 /dev/sdd -F] 2020/09/04 05:54:06 Command output /usr/bin/sudo: symbol lookup error: /usr/bin/sudo: undefined symbol: sudo_term_eof E0904 05:54:06.285924 1 utils.go:48] GRPC error: rpc error: code = InvalidArgument desc = Could not create file system on attached volume directory /dev/sdd. Error is exit status 127
Looks like the root cause was undefined symbol: sudo_term_eof while running /usr/bin/sudo /usr/sbin/mkfs.ext4 /dev/sdd -F
But when I ran the above command manually on worker-1, it succeeded.
[core@worker-1 ~]$ /usr/bin/sudo /usr/sbin/mkfs.ext4 /dev/sdd -F mke2fs 1.45.4 (23-Sep-2019) Creating filesystem with 262144 4k blocks and 65536 inodes Filesystem UUID: c8580528-0539-4643-81e8-e3a793a54d51 Superblock backups stored on blocks: 32768, 98304, 163840, 229376
Allocating group tables: done Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done