Closed BlaineEXE closed 11 months ago
I think something changed in ceph image. Just changing nfs-ganesha image lead to a successful mount.
https://github.com/rook/rook/issues/13151#issuecomment-1798215739
I was told that the Ganesha 5.7 version present in the most recent v18 images fixes that issue. I will investigate further.
[Update] You're right. The image running in CI still has 5.6
[root@rook-ceph-my-nfs /]# ganesha.nfsd -v
NFS-Ganesha Release = V5.6
It is using the quay.io/ceph/ceph:v18
image; however, I realized today that there are differences between the ganesha packages present in the v18
arm image and the v18
x86 image.
ARM
[root@f011d07eaf05 /]# ganesha.nfsd -v
NFS-Ganesha Release = V5.7
x86
[root@180aaa0a8afe /]# ganesha.nfsd -v
NFS-Ganesha Release = V5.6
Closing this as solved, and I'll figure out how to get this fixed in ceph-container. Thanks, Rakshith!
Describe the bug
I need some help trying to determine why NFS mounts began failing in Rook CI on October 27th. From what I have looked at so far, this doesn't correspond to any changes in Rook or Ceph-CSI. I'm hoping someone here might have a better idea of where to look. @Rakshith-R or @nixpanic perhaps?
Environment details
Linux fv-az619-734 5.15.0-1050-azure #57~20.04.1-Ubuntu SMP Wed Oct 4 17:09:16 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : NFSSteps to reproduce
Steps to reproduce the behavior:
This is failing in all of Rook's CI runs starting October 27th. E.g., https://github.com/rook/rook/actions/runs/6786937018/job/18495134986
Actual results
The best indication of what is going wrong is the
No such file or directory
error given by CSI when mounting to the test pod, described below.Describe what happened
Expected behavior
Users should be able to mount NFS volumes in their pods.
Logs
NFS Provisioner logs:
NFS driver logs:
Dmesg logs: https://gist.github.com/BlaineEXE/df404fb8d79fef06258a07c780abb382
I looked at the location on the CI host to see if there is a host problem, and there may be, but the behavior is strange to me. I can
sudo ls
the host mount location and see the dir that is supposed to exist, butsudo ls
of that dir givesNo such file or directory
I tried listing again afterwards, with more list options, and it appears to be gone, like it is being created and deleted by something seemingly at random: