NetApp / trident

Storage orchestrator for containers
Apache License 2.0
762 stars 222 forks source link

"Unable to attach or mount volumes: unmounted volumes" in Openshift 4.13 #838

Closed VfBfoerst closed 1 year ago

VfBfoerst commented 1 year ago

Describe the bug After configuring the trident operator in openshift (4.13) and the ontap netapps, we are unable to mount the volume in any pod. The PVC and PV are succesfully created, but the pod description provides the following error messages: oc describe pod x

  Normal   SuccessfulAttachVolume  11m                    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-249dbf83-410c-4421-929e-a8b3c93119db"
  Warning  FailedMount             3m21s (x4 over 9m32s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[pvc-249dbf83-410c-4421-929e-a8b3c93119db], unattached volumes=[pvc-249dbf83-410c-4421-929e-a8b3c93119db kube-api-access-8hm6s]: timed out waiting for the condition
  Warning  FailedMount             83s (x5 over 9m31s)    kubelet                  MountVolume.SetUp failed for volume "pvc-249dbf83-410c-4421-929e-a8b3c93119db" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount             78s                    kubelet                  Unable to attach or mount volumes: unmounted volumes=[pvc-249dbf83-410c-4421-929e-a8b3c93119db], unattached volumes=[kube-api-access-8hm6s pvc-249dbf83-410c-4421-929e-a8b3c93119db]: timed out waiting for the condition

The Netapp-Storage is provided as a aggregat. The netapp doesn't show any error messages, the trident pods either. This leads to pods in unlimited status "ContainerCreating". Environment Provide accurate information about the environment to help us reproduce the issue.

To Reproduce Install Trident as provided by the documentation. Create the backend via
tridentctl create backend -f backend.yaml -n trident
backend.yaml:

version: 1
storageDriverName: ontap-nas
backendName: ontap-nas-backend-yaml
managementLIF: 123.123.123.12
dataLIF: 123.123.123.12
svm: vs_a02_fisg_nfs01_ocp
username: xxx
password: xxx
autoExportCIDRs:
- 123.123.123.0/24
defaults:
  exportPolicy: XXX_POL1

Create a storage-class with oc create -f basic-storageclass.yaml. basic-storageclass.yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: basic-csi
provisioner: csi.trident.netapp.io
parameters:
  backendType: "ontap-nas"
  fsType: "nfs"

Create a PVC and respectively an PV for a deployment:
oc set volume deployment/httpd-ex --add --mount-path=/tmp/test-pvc --claim-size=2Gi --claim-name=httpd-test-pvc-tmp-netapp-termin --claim-class=basic-csi --claim-mode=ReadWriteMany -t pvc --name=pv-trident-test

Expected behavior The httpd-ex pod can succesfully startup and mount the volume to /tmp/test-pvc.

Additional context Thank you very much 😸

wonderland commented 1 year ago

The "context deadline exceeded" is a timeout for the NFS mount. Can you please check if the worker nodes can reach the dataLIF IP (port 2049 for NFSv4, a range of ports for NFSv3)? Is the specified export policy configured correctly for these nodes to mount? Maybe connect to one of the worker nodes and try a manual NFS mount.

VfBfoerst commented 1 year ago

Maybe connect to one of the worker nodes and try a manual NFS mount.

Hi @wonderland, thanks for your fast reply. We can't mount it to our worker nodes, it provides error message:
mount -vvv 123.123.123.12:/trident_pvc_2xfapakshenrqwreb /tmp/nfsmount2

mount.nfs: failed to apply fstab options

Both, the dataLIF and the nodes are in the same subnet. Therefore there is no firewall between. Despite this, it seems the netapp can't be reached:
curl -vv 123.123.123.12:2049

*   Trying 123.123.123.12:2049...
* connect to 123.123.123.12 port 2049 failed: Connection refused
* Failed to connect to 123.123.123.12 port 2049: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 123.123.123.12 port 2049: Connection refused

rpcinfo -p 123.123.123.12

123.123.123.12: RPC: Unable to receive

NFS is activated from the netapp side though.

VfBfoerst commented 1 year ago

After further research with our storage team, we found out that nfs was not enabled on the svm. After enabling nfs, the volume could be mounted. 😸

VfBfoerst commented 1 year ago

For anyone running in the same "issue", start a debug pod on your node:
oc debug node/xxx
Change root:
chroot /host
Check connectivity:
rpcinfo -p [DataLIF]
The result should be something like:

   program vers proto   port  service
[...]
    100003    3   udp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
 [...]

If this is not the case, activate NFS on the SVM or at least check its activation state. It's likely to be the problem then. 😸