NearNodeFlash / NearNodeFlash.github.io

View this document https://nearnodeflash.github.io/
Apache License 2.0
3 stars 3 forks source link

lustre-csi-driver mount.lustre error: is already mounted #149

Closed roehrich-hpe closed 2 months ago

roehrich-hpe commented 2 months ago

The lustre-csi-driver should be able to recognize when a mount has already completed, and should return without an error.

Here is the 'kubectl describe' event:

Events:
  Type     Reason       Age                  From     Message
  ----     ------       ----                 ----     -------
  Warning  FailedMount  3m9s (x44 over 90m)  kubelet  MountVolume.SetUp failed for volume "lustre4-nnf-dm-system-readwritemany-pv" : rpc error: code = Internal desc = NodePublishVolume - Mount Failed: Error mount failed: exit status 17
Mounting command: mount
Mounting arguments: -t lustre 2056@kfi4:2120@kfi4:2184@kfi4:2248@kfi4:/lustre4 /var/lib/kubelet/pods/c1ab40dc-3864-4938-b66a-7490abd6524c/volumes/kubernetes.io~csi/lustre4-nnf-dm-system-readwritemany-pv/mount
Output: mount.lustre: according to /etc/mtab 2056@kfi4:2120@kfi4:2184@kfi4:2248@kfi4:/lustre4 is already mounted on /var/lib/kubelet/pods/c1ab40dc-3864-4938-b66a-7490abd6524c/volumes/kubernetes.io~csi/lustre4-nnf-dm-system-readwritemany-pv/mount

And the node's mount output:

# mount | grep lustre
2056@kfi4:2120@kfi4:2184@kfi4:2248@kfi4:/lustre4 on /var/lib/kubelet/pods/c1ab40dc-3864-4938-b66a-7490abd6524c/volumes/kubernetes.io~csi/lustre4-nnf-dm-system-readwritemany-pv/mount type lustre (rw,checksum,flock,nouser_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,encrypt)

Here's the pod and node summaries:

$ kubectl get pod -n nnf-dm-system nnf-dm-worker-9bhxv -o wide
NAME                  READY   STATUS              RESTARTS   AGE    IP       NODE       NOMINATED NODE   READINESS GATES
nnf-dm-worker-9bhxv   0/2     ContainerCreating   0          103m   <none>   elcap317   <none>           <none>

$ kubectl get node elcap317
NAME       STATUS   ROLES    AGE    VERSION
elcap317   Ready    <none>   102m   v1.29.3

Here's the CSI driver:

$ kubectl get pods -n lustre-csi-system -o wide | grep elcap317
lustre-csi-node-dms74   2/2     Running   0          107m    10.85.148.130   elcap317   <none>           <none>

The CSI driver log shows one successful mount followed by many more mount attempts:

time="2024-04-11T17:03:01Z" level=info msg=Mounted source="2056@kfi4:2120@kfi4:2184@kfi4:2248@kfi4:/lustre4" target="/var/lib/kubelet/pods/c1ab40dc-3864-4938-b66a-7490abd6524c/volumes/kubernetes.io~csi/lustre4-nnf-dm-system-readwritemany-pv/mount"

[and the rest is continuous repeating of the following...]

time="2024-04-11T17:05:13Z" level=debug msg="/csi.v1.Node/NodePublishVolume: REQ
 0011: VolumeId=2056@kfi4:2120@kfi4:2184@kfi4:2248@kfi4:/lustre4, TargetPath=/va
r/lib/kubelet/pods/c1ab40dc-3864-4938-b66a-7490abd6524c/volumes/kubernetes.io~cs
i/lustre4-nnf-dm-system-readwritemany-pv/mount, VolumeCapability=mount:<fs_type:
\"lustre\" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > , Readonly=false, XXX_N
oUnkeyedLiteral={}, XXX_sizecache=0"
Mounting arguments: -t lustre 2056@kfi4:2120@kfi4:2184@kfi4:2248@kfi4:/lustre4 /
var/lib/kubelet/pods/c1ab40dc-3864-4938-b66a-7490abd6524c/volumes/kubernetes.io~
csi/lustre4-nnf-dm-system-readwritemany-pv/mount
Output: mount.lustre: according to /etc/mtab 2056@kfi4:2120@kfi4:2184@kfi4:2248@
kfi4:/lustre4 is already mounted on /var/lib/kubelet/pods/c1ab40dc-3864-4938-b66
a-7490abd6524c/volumes/kubernetes.io~csi/lustre4-nnf-dm-system-readwritemany-pv/
mount
time="2024-04-11T17:05:13Z" level=debug msg="Mounting arguments: -t lustre 2056@
kfi4:2120@kfi4:2184@kfi4:2248@kfi4:/lustre4 /var/lib/kubelet/pods/c1ab40dc-3864-
4938-b66a-7490abd6524c/volumes/kubernetes.io~csi/lustre4-nnf-dm-system-readwrite
many-pv/mount"
time="2024-04-11T17:05:13Z" level=debug msg="Output: mount.lustre: according to 
/etc/mtab 2056@kfi4:2120@kfi4:2184@kfi4:2248@kfi4:/lustre4 is already mounted on
 /var/lib/kubelet/pods/c1ab40dc-3864-4938-b66a-7490abd6524c/volumes/kubernetes.i
o~csi/lustre4-nnf-dm-system-readwritemany-pv/mount"
roehrich-hpe commented 2 months ago

Fixed by https://github.com/HewlettPackard/lustre-csi-driver/pull/71