Closed ygg-drop closed 1 year ago
AFAIK it should not an issue, we are using cephcsi with Qunicy, and we don't have this issue reported from anyone. @ygg-drop couple of questions
-t ceph 192.168.1.10,192.168.1.11,192.168.1.12:/
Have you tried specifying the monitor port?Someone has to make the first report :wink: maybe this use-case is not very popular?
Is your ceph cluster healthy?
Yes.
Have you retried running the mount command manually from the cephfsplugin container?
Yes, I get the same error:
$ docker exec -ti ceac87b3d9fe bash
[root@ceac87b3d9fe /]# echo 'AQDKpPtiDr30NRAAsqtMLh0WHUqZ0L4f2S/ouA==' > /tmp/csi/keys/admin.key
[root@ceac87b3d9fe /]# mount -t ceph 192.168.1.10,192.168.1.11,192.168.1.12:/ /mnt -o 'name=admin,secretfile=/tmp/csi/keys/admin.key,_netdev,fs=nomadfs'
unable to get monitor info from DNS SRV with service name: ceph-mon
2022-08-17T08:12:46.934+0000 7fceff0def40 -1 failed for service _ceph-mon._tcp
[root@ceac87b3d9fe /]#
-t ceph 192.168.1.10,192.168.1.11,192.168.1.12:/ Have you tried specifying the monitor port?
Yes, same error:
[root@ceac87b3d9fe /]# mount -t ceph 192.168.1.10:6789,192.168.1.11:6789,192.168.1.12:6789:/ /mnt -o 'name=admin,secretfile=/tmp/csi/keys/admin.key,_netdev,fs=nomadfs'
unable to get monitor info from DNS SRV with service name: ceph-mon
2022-08-17T08:12:46.934+0000 7fceff0def40 -1 failed for service _ceph-mon._tcp
[root@ceac87b3d9fe /]#
When I try to mount using the Quincy mount.ceph
syntax it works:
[root@ceac87b3d9fe /]# mount -t ceph admin@67b72852-d1b8-45ad-b1f8-edb8c150ff9b.nomadfs=/ /mnt -o 'secretfile=/tmp/csi/keys/admin.key,_netdev,mon_addr=192.168.1.10/192.168.1.11/192.168.1.12'
[root@ceac87b3d9fe /]# ls -la /mnt
total 4
drwxr-xr-x 3 root root 1 Aug 12 08:41 .
drwxr-xr-x 1 root root 4096 Aug 17 08:07 ..
drwxr-xr-x 3 root root 2 Aug 12 08:41 volumes
[root@ceac87b3d9fe /]#
EDIT:
I just tested with quay.io/cephcsi/cephcsi:v3.5.1
(which is based on Pacific) and the mount commands which failed previously do work there.
I have exactly the same problem. With basic k8s installation (1.25) and ceph installation (17.2.3).
Where cephcsi:v3.5.1
works fine and cephcsi:v3.7.1
fails.
@mchangir can you please help here, not sure why mounting fails here cephcsi uses ceph 17.2 as the base image but still looks like the mount is failing on the 17.2.3 cluster.
Note:- we have not seen this issue in Rook ceph clusters.
@Informize can you please provide the dmesg on the node
@Informize can you also run the mount command in verbose mode?
@ygg-drop I am wondering the compatibility between userspace package (ex: ceph-common) for the binaries to kernel ((5.15.41-0-lts) cause the issue here ? With that assumption , is it the same versions you have in your cluster nodes ? and mount fails on all the nodes ? or speciifc to some nodes ?
The mount syntax changes have been kept backward compatible. The old syntax should work with newer kernels.
@ygg-drop dmesg and running mount helper with verbose flag would help debug what's going on.
@Madhu-1
OS on ceph nodes and k8s control/worker nodes are all:
Linux control-11 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Version of cephcsi that works:
kubectl exec -it csi-cephfsplugin-4pkct -c csi-cephfsplugin bash
# /usr/local/bin/cephcsi --version
Cephcsi Version: v3.5-canary
Git Commit: c374edcbaa2a5dd364a9d526728e1629cd666a82
Go Version: go1.17.5
Compiler: gc
Platform: linux/amd64
Kernel: 5.4.0-125-generic
and mount command:
[root@worker-12 /]# mount -v -t ceph 192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt -o 'name=admin,secretfile=/tmp/auth.key,_netdev,fs=cephfs'
parsing options: rw,name=admin,secretfile=/tmp/auth.key,fs=cephfs,_netdev
mount.ceph: options "name=admin,mds_namespace=cephfs" will pass to kernel.
[root@worker-12 /]# df -h /mnt
Filesystem Size Used Avail Use% Mounted on
192.168.21.11,192.168.21.12,192.168.21.13:/ 373G 0 373G 0% /mnt
With ceph-csi 3.7
[root@worker-12 /]# /usr/local/bin/cephcsi --version
Cephcsi Version: v3.7-canary
Git Commit: 468c73d2b61a955503bd82e083b209f73e62a12e
Go Version: go1.18.5
Compiler: gc
Platform: linux/amd64
Kernel: 5.4.0-125-generic
And mount command in verbose mode:
[root@worker-12 /]# mount -v -t ceph 192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt -o 'name=admin,secretfile=/tmp/auth.key,_netdev,fs=cephfs'
parsing options: rw,name=admin,secretfile=/tmp/auth.key,fs=cephfs,_netdev
mount.ceph: options "name=admin,mds_namespace=cephfs".
invalid new device string format
unable to get monitor info from DNS SRV with service name: ceph-mon
keyring.get_secret failed
2022-09-15T09:00:11.472+0000 7f0946866f40 -1 failed for service _ceph-mon._tcp
mount.ceph: resolved to: "192.168.21.11,192.168.21.12,192.168.21.13"
mount.ceph: trying mount with old device syntax: 192.168.21.11,192.168.21.12,192.168.21.13:/
mount.ceph: options "name=admin,mds_namespace=cephfs,key=admin,fsid=00000000-0000-0000-0000-000000000000" will pass to kernel
And mount command in verbose mode:
[root@worker-12 /]# mount -v -t ceph 192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt -o 'name=admin,secretfile=/tmp/auth.key,_netdev,fs=cephfs' parsing options: rw,name=admin,secretfile=/tmp/auth.key,fs=cephfs,_netdev mount.ceph: options "name=admin,mds_namespace=cephfs". invalid new device string format unable to get monitor info from DNS SRV with service name: ceph-mon keyring.get_secret failed 2022-09-15T09:00:11.472+0000 7f0946866f40 -1 failed for service _ceph-mon._tcp mount.ceph: resolved to: "192.168.21.11,192.168.21.12,192.168.21.13" mount.ceph: trying mount with old device syntax: 192.168.21.11,192.168.21.12,192.168.21.13:/ mount.ceph: options "name=admin,mds_namespace=cephfs,key=admin,fsid=00000000-0000-0000-0000-000000000000" will pass to kernel
Do you see no output after this or does the command hang? And does the mount go through (grep ceph /proc/mounts)?
Anything in dmesg? The 0's in fsid might be the issue here.
@vshankar It hangs for a bit and then returns to prompt dmesg relevant messages:
[691243.945358] libceph: mon1 (1)192.168.21.12:6789 session established
[691243.945750] libceph: mon1 (1)192.168.21.12:6789 socket closed (con state OPEN)
[691243.945764] libceph: mon1 (1)192.168.21.12:6789 session lost, hunting for new mon
[691243.950361] libceph: mon2 (1)192.168.21.13:6789 session established
[691243.951107] libceph: client64579 fsid b5426b62-1ecd-11ed-90ab-8f774f76a3a8
and cat /proc/mounts | grep ceph
:
192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt ceph rw,relatime,name=admin,secret=<hidden>,fsid=00000000-0000-0000-0000-000000000000,acl,mds_namespace=cephfs 0 0
pvc/pv's:
root@control-11:~# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-fe7981e4-148e-4f31-bf09-205ad2ba36ed 1Gi RWX Delete Bound default/csi-cephfs-pvc csi-cephfs-sc 25m
root@control-11:~# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
csi-cephfs-pvc Bound pvc-fe7981e4-148e-4f31-bf09-205ad2ba36ed 1Gi RWX csi-cephfs-sc 27m
pod:
root@control-11:~# kubectl get pods csi-cephfs-demo-pod
NAME READY STATUS RESTARTS AGE
csi-cephfs-demo-pod 0/1 ContainerCreating 0 21m
root@control-11:~# kubectl describe pod csi-cephfs-demo-pod
Name: csi-cephfs-demo-pod
Namespace: default
Priority: 0
Service Account: default
Node: worker-11/192.168.31.21
Start Time: Thu, 15 Sep 2022 13:29:37 +0200
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
web-server:
Container ID:
Image: docker.io/library/nginx:latest
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/www from mypvc (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-v8qv7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
mypvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: csi-cephfs-pvc
ReadOnly: false
kube-api-access-v8qv7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 22m default-scheduler Successfully assigned default/csi-cephfs-demo-pod to worker-11
Warning FailedMount 10m (x2 over 17m) kubelet Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[kube-api-access-v8qv7 mypvc]: timed out waiting for the condition
Warning FailedAttachVolume 113s (x9 over 20m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-fe7981e4-148e-4f31-bf09-205ad2ba36ed" : timed out waiting for external-attacher of cephfs.csi.ceph.com CSI driver to attach volume 0001-0024-b5426b62-1ecd-11ed-90ab-8f774f76a3a8-0000000000000001-3def7d0a-34e9-11ed-9616-9274cb35d772
Warning FailedMount 107s (x7 over 19m) kubelet Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc kube-api-access-v8qv7]: timed out waiting for the condition
@vshankar It hangs for a bit and then returns to prompt dmesg relevant messages:
[691243.945358] libceph: mon1 (1)192.168.21.12:6789 session established [691243.945750] libceph: mon1 (1)192.168.21.12:6789 socket closed (con state OPEN) [691243.945764] libceph: mon1 (1)192.168.21.12:6789 session lost, hunting for new mon [691243.950361] libceph: mon2 (1)192.168.21.13:6789 session established [691243.951107] libceph: client64579 fsid b5426b62-1ecd-11ed-90ab-8f774f76a3a8
and
cat /proc/mounts | grep ceph
:192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt ceph rw,relatime,name=admin,secret=<hidden>,fsid=00000000-0000-0000-0000-000000000000,acl,mds_namespace=cephfs 0 0
That the cephfs mount then, isn't it?
stat /mnt
?
@vshankar yes indeed, with mount command i can manually mount it on worker node
@vshankar yes indeed, with mount command i can manually mount it on worker node
OK. Is the same command run by ceph-csi plugin? Can you enable mount helper debugging when its being run by ceph-csi?
I wonder whats the cat /proc/mounts | grep ceph
output where the mount succeeds at the command-line
mount helper debugging when its being run by ceph-csi?
sorry, can you point out where to look how to enable debug by ceph-csi ?
mount helper debugging when its being run by ceph-csi?
sorry, can you point out where to look how to enable debug by ceph-csi ?
I have no idea. @Madhu-1 @humblec might know.
I wonder whats the
cat /proc/mounts | grep ceph
output where the mount succeeds at the command-line
/proc/mounts has the record to the cephfs mount as mentioned in this comment https://github.com/ceph/ceph-csi/issues/3309#issuecomment-1247980121
I wonder whats the
cat /proc/mounts | grep ceph
output where the mount succeeds at the command-line/proc/mounts has the record to the cephfs mount as mentioned in this comment #3309 (comment)
I presume that this comment lists the contents of /proc/mounts
when the commands doesn't succeed as desired.
What I meant to ask was the output of /proc/mounts
when the command succeeds immediately to verify the fsid
listed in the output against the failed/delayed version.
mount helper debugging when its being run by ceph-csi?
sorry, can you point out where to look how to enable debug by ceph-csi ?
I have no idea. @Madhu-1 @humblec might know.
No option for doing it, you can exec into the csi-cephfsplugin container and run the mount command manually with debug flags
It will manually mount but not from a pod. Debug flags are set as you can see in this comment: https://github.com/ceph/ceph-csi/issues/3309#issuecomment-1247834262
@Informize Without debug logs from a failed mount instance, its hard to tell what's going on.
@vshankar
@Madhu-1
OS on ceph nodes and k8s control/worker nodes are all:
Linux control-11 5.4.0-125-generic #141-Ubuntu SMP Wed Aug 10 13:42:03 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Version of cephcsi that works:
kubectl exec -it csi-cephfsplugin-4pkct -c csi-cephfsplugin bash # /usr/local/bin/cephcsi --version Cephcsi Version: v3.5-canary Git Commit: c374edcbaa2a5dd364a9d526728e1629cd666a82 Go Version: go1.17.5 Compiler: gc Platform: linux/amd64 Kernel: 5.4.0-125-generic
and mount command:
[root@worker-12 /]# mount -v -t ceph 192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt -o 'name=admin,secretfile=/tmp/auth.key,_netdev,fs=cephfs' parsing options: rw,name=admin,secretfile=/tmp/auth.key,fs=cephfs,_netdev mount.ceph: options "name=admin,mds_namespace=cephfs" will pass to kernel. [root@worker-12 /]# df -h /mnt Filesystem Size Used Avail Use% Mounted on 192.168.21.11,192.168.21.12,192.168.21.13:/ 373G 0 373G 0% /mnt
With ceph-csi 3.7
[root@worker-12 /]# /usr/local/bin/cephcsi --version Cephcsi Version: v3.7-canary Git Commit: 468c73d2b61a955503bd82e083b209f73e62a12e Go Version: go1.18.5 Compiler: gc Platform: linux/amd64 Kernel: 5.4.0-125-generic
And mount command in verbose mode:
[root@worker-12 /]# mount -v -t ceph 192.168.21.11,192.168.21.12,192.168.21.13:/ /mnt -o 'name=admin,secretfile=/tmp/auth.key,_netdev,fs=cephfs' parsing options: rw,name=admin,secretfile=/tmp/auth.key,fs=cephfs,_netdev mount.ceph: options "name=admin,mds_namespace=cephfs". invalid new device string format unable to get monitor info from DNS SRV with service name: ceph-mon keyring.get_secret failed 2022-09-15T09:00:11.472+0000 7f0946866f40 -1 failed for service _ceph-mon._tcp mount.ceph: resolved to: "192.168.21.11,192.168.21.12,192.168.21.13" mount.ceph: trying mount with old device syntax: 192.168.21.11,192.168.21.12,192.168.21.13:/ mount.ceph: options "name=admin,mds_namespace=cephfs,key=admin,fsid=00000000-0000-0000-0000-000000000000" will pass to kernel
This is all the debug information that i have. Is there a way how to get extra debug information ? I see there is also another github issue related to this one: https://github.com/ceph/ceph-csi/issues/3390
This is all the debug information that i have. Is there a way how to get extra debug information ? I see there is also another github issue related to this one: #3390
Do you have dmesg logs when the mount fails from the pod?
@Informize i dont see any mount failure in your case. Do you think the mount is failing? Can you please provide cephfsplugin container logs?
Events: Type Reason Age From Message
Normal Scheduled 22m default-scheduler Successfully assigned default/csi-cephfs-demo-pod to worker-11 Warning FailedMount 10m (x2 over 17m) kubelet Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[kube-api-access-v8qv7 mypvc]: timed out waiting for the condition Warning FailedAttachVolume 113s (x9 over 20m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-fe7981e4-148e-4f31-bf09-205ad2ba36ed" : timed out waiting for external-attacher of cephfs.csi.ceph.com CSI driver to attach volume 0001-0024-b5426b62-1ecd-11ed-90ab-8f774f76a3a8-0000000000000001-3def7d0a-34e9-11ed-9616-9274cb35d772 Warning FailedMount 107s (x7 over 19m) kubelet Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc kube-api-access-v8qv7]: timed out waiting for the condition
here is the mount failing
csi-cephfsplugin-q4wrr csi-cephfsplugin E1015 07:14:58.049484 1 nodeserver.go:273] ID: 98 Req-ID: 0001-0009-rook-ceph-0000000000000001-e305d04a-4c56-11ed-b1c9-bad2e6b34b46 failed to mount volume 0001-0009-rook-ceph-0000000000000001-e305d04a-4c56-11ed-b1c9-bad2e6b34b46: an error (exit status 32) occurred while running mount args: [-t ceph 172.16.31.1:6789,172.16.31.2:6789,172.16.31.3:6789:/volumes/csi/csi-vol-e305d04a-4c56-11ed-b1c9-bad2e6b34b46/1de9ce81-0c1b-40f1-9a06-7bd9147b17fd /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/4d6a495ec8fb8ebee362db7492c7cfe27425c92c4ab6a98daf91f3b69f37227c/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-1117944320,mds_namespace=ceph-filesystem,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
csi-cephfsplugin-q4wrr csi-cephfsplugin 2022-10-15T07:11:53.545+0000 7fd361a91f40 -1 failed for service _ceph-mon._tcp
csi-cephfsplugin-q4wrr csi-cephfsplugin mount error 110 = Connection timed out
csi-cephfsplugin-q4wrr csi-cephfsplugin Check dmesg logs if required.
Happen on rook 1.10.2 , external ceph installed via cephadm in version 16.2.10, kubernetes nodes are on archlinux, kernel 5.19.5-arch1-1
No really relevant dmesg error:
# dmesg | grep ceph
[123793.160045] libceph: mon1 (1)172.16.31.2:6789 session established
[123793.166842] libceph: client88753 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[124041.551742] libceph: mon1 (1)172.16.31.2:6789 session established
[124041.553692] libceph: client88993 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[124202.424541] libceph: mon1 (1)172.16.31.2:6789 session established
[124202.427140] libceph: client89176 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[124450.617950] libceph: mon0 (1)172.16.31.1:6789 session established
[124450.619957] libceph: client78993 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[124692.657906] libceph: mon2 (1)172.16.31.3:6789 session established
[124692.661706] libceph: client89738 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[124934.731612] libceph: mon2 (1)172.16.31.3:6789 session established
[124934.733545] libceph: client89969 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[125176.752300] libceph: mon0 (1)172.16.31.1:6789 session established
[125176.757338] libceph: client79680 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
[125418.846206] libceph: mon2 (1)172.16.31.3:6789 session established
[125418.848306] libceph: client90452 fsid f7238ede-4bab-11ed-b520-0008a20c73ec
Same error trying to mount manually (in the container, archlinux remove ceph-library from repository few days ago), and volume don't get mounted:
# crictl exec -ti 97e8c71b2ca56 bash
# mount -t ceph 172.16.31.1:6789,172.16.31.2:6789,172.16.31.3:6789:/volumes/csi/csi-vol-e305d04a-4c56-11ed-b1c9-bad2e6b34b46/1de9ce81-0c1b-40f1-9a06-7bd9147b17fd /var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.cephfs.csi.ceph.com/4d6a495ec8fb8ebee362db7492c7cfe27425c92c4ab6a98daf91f3b69f37227c/globalmount -o name=csi-cephfs-node,secretfile=/tmp/csi/keys/keyfile-3681000612,mds_namespace=ceph-filesystem,_netdev
unable to get monitor info from DNS SRV with service name: ceph-mon
2022-10-15T07:29:47.176+0000 7f6117edef40 -1 failed for service _ceph-mon._tcp
mount error 110 = Connection timed out
# mount | grep ceph
/dev/sda3 on /etc/ceph-csi-config type ext4 (ro,relatime,data=ordered)
tmpfs on /var/lib/kubelet/pods/1b34caf0-80de-4557-9fff-02ef96c04947/volumes/kubernetes.io~projected/ceph-csi-configs type tmpfs (rw,relatime,size=2097152k,inode64)
tmpfs on /var/lib/kubelet/pods/35e5886b-4939-4312-91ce-a628dd5979bf/volumes/kubernetes.io~projected/ceph-csi-configs type tmpfs (rw,relatime,size=1310720k,inode64)
tmpfs on /var/lib/kubelet/pods/75c839ea-d1b9-4df6-8cc8-fb7f149954f3/volumes/kubernetes.io~secret/rook-ceph-mds-ceph-filesystem-a-keyring type tmpfs (rw,relatime,size=16252404k,inode64)
tmpfs on /var/lib/kubelet/pods/109e62d6-41e2-435b-b8c8-a4e8c943bc80/volumes/kubernetes.io~secret/rook-ceph-crash-collector-keyring type tmpfs (rw,relatime,size=61440k,inode64)
tmpfs on /var/lib/kubelet/pods/166d5b99-4f47-4d68-8bf3-be181b04d4bd/volumes/kubernetes.io~secret/rook-ceph-rgw-ceph-objectstore-a-keyring type tmpfs (rw,relatime,size=16252404k,inode64)
Confirm that running rook 1.9.12 (rook 1.10.X supports cephcsi >=3.6.0, that does NOT solve the issue) and forcing cephcsi:3.5.1 solve the issue.
Can someone provide me the setup details/reproducer i would like to reproduce it locally and see what is wrong.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.
This seems to be an active issue, where the only workaround is downgrading cephcsi. There is also an open PR for it. Should it be reopened?
This seems to be an active issue, where the only workaround is downgrading cephcsi. There is also an open PR for it. Should it be reopened?
There is PR (https://github.com/ceph/ceph/pull/48873) in ceph to fix a mount issue. Are you referring to that or some other fix?
Describe the bug
Apparently there was a significant change in the
mount.ceph
syntax between Ceph Pacific and Quincy. However Ceph-CSI code does not seem to be updated to support the new syntax.I use Nomad 1.3.1 and I am trying to use Ceph-CSI to provide CephFS-based volumes to Nomad jobs. I tried the 3.6.2 version of Ceph-CSI (which is already based on Quincy) to mount a CephFS volume from a cluster running Ceph 17.2.0.
I use Nomad instead of Kubernetes, but I don't think this fact affects this bug.
Environment details
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : kernelSteps to reproduce
Steps to reproduce the behavior:
nomadfs
and admin usernomad job run ceph-csi-plugin-controller.nomad
nomad job run ceph-csi-plugin-nodes.nomad
sample-fs-volume.hcl
by running:nomad volume register sample-fs-volume.hcl
mysql-fs.nomad
which tries to use the volume created in previous step using:nomad job run mysql-fs.nomad
.ceph-mysql-fs
job allocation logs.ceph-csi-plugin-controller.nomad:
ceph-csi-plugin-nodes.nomad:
sample-fs-volume.hcl:
mysql-fs.nomad:
Actual results
Ceph-CSI node plugin failed to mount CephFS.
Expected behavior
Ceph-CSI node plugin should successfully mount CephFS using the new
mount.ceph
syntax.Logs
nomad alloc status
events:I suspect the
unable to get monitor info from DNS SRV
error happens because themount.ceph
helper in 17.x does not recognize anymore passing monitor IPs this way and falls back to using DNS SRV records.Additional context