ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 541 forks source link

cephfs volume creation error: setfattr: Operation not supported on k8s node #99

Closed compilenix closed 5 years ago

compilenix commented 5 years ago

I'm hoping you can help me with that, though it seems not directly a cause of ceph-csi itself.

attr's in general are supported and enabled by the filesystem (EXT4) on the node (tried with setfattr -n user.foo -v bar foobar). I've tried the setfattr command on the node and within the container. It does not work on either of those.

The k8s node is running on a Ubuntu 18.04.1 LTS

Container (kubectl exec csi-cephfsplugin-provisioner-0 -i -t -- sh -il):

# setfattr -n ceph.quota.max_bytes -v 5368709120 /var/lib/kubelet/plugins/csi-cephfsplugin/controller/volumes/root-csi-cephfs-378acacb-e5ed-11e8-9b8b-c60474a907dc
sh: setfattr: not found

Node:

# setfattr -n ceph.quota.max_bytes -v 5368709120 /var/lib/kubelet/plugins/csi-cephfsplugin/controller/volumes/root-csi-cephfs-378acacb-e5ed-11e8-9b8b-c60474a907dc
setfattr: root-csi-cephfs-378acacb-e5ed-11e8-9b8b-c60474a907dc: Operation not supported

Here are the logs: logs-from-csi-cephfsplugin-attacher-in-csi-cephfsplugin-attacher-0.txt logs-from-csi-provisioner-in-csi-cephfsplugin-provisioner-0.txt logs-from-driver-registrar-in-csi-cephfsplugin-t94m4.txt logs-from-csi-cephfsplugin-in-csi-cephfsplugin-t94m4.txt

Greetings and sorry for bothering you again :disappointed:

compilenix commented 5 years ago

Does this might relate to this?

Quotas are implemented in the kernel client 4.17 and higher. Quotas are supported by the userspace client (libcephfs, ceph-fuse). Linux kernel clients >= 4.17 support CephFS quotas but only on mimic+ clusters. Kernel clients (even recent versions) will fail to handle quotas on older clusters, even if they may be able to set the quotas extended attributes.

I have Ceph on Debian 9 (12.2.8 -> luminous)

Source: http://docs.ceph.com/docs/mimic/cephfs/quota/

gman0 commented 5 years ago

I don't think that's related, and the plugin defaults to the FUSE driver anyway. Could you please attach /etc/ceph/ceph.conf from the Ceph cluster?

compilenix commented 5 years ago
[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 10.10.1.0/24
     fsid = 0a83c65a-5016-41d6-b4cb-298ec32b4fa8
     keyring = /etc/pve/priv/$cluster.$name.keyring
     mon allow pool delete = true
     osd journal size = 5120
     osd pool default min size = 1
     osd pool default size = 2
     public network = 10.10.1.0/24

[client]
     rbd cache = true
     rbd cache max dirty = 536870912
     rbd cache max dirty age = 30.0
     rbd cache size = 4294967296
     rbd cache target dirty = 67108864
     rbd cache writethrough until flush = true
     rbd default order = 20
     rbd default stripe count = 6
     rbd default stripe unit = 65536
     rbd op threads = 8

[client.telegraf]
     keyring = /etc/ceph/client.telegraf.keyring

[osd]
     bluestore_cache_kv_max = 524288000
     bluestore_cache_size = 524288000
     bluestore_cache_size_hdd = 524288000
     bluestore_cache_size_ssd = 524288000
     debug filestore = 0
     debug osd = 0
     filestore max sync interval = 15
     filestore min sync interval = 0.01
     filestore op threads = 32
     filestore queue commiting max ops = 5000
     filestore queue max bytes = 10485760000
     filestore queue max ops = 25000
     keyring = /var/lib/ceph/osd/ceph-$id/keyring
     max open files = 65536
     osd client message size cap = 2147483648
     osd client op priority = 63
     osd crush update on start = false
     osd deep scrub interval = 2592000
     osd disk thread ioprio class = idle
     osd disk thread ioprio priority = 7
     osd disk threads = 2
     osd map cache bl size = 128
     osd map cache size = 1024
     osd map message max = 1024
     osd max backfills = 2
     osd max scrubs = 10
     osd max write size = 512
     osd op threads = 8
     osd recovery max active = 2
     osd recovery op priority = 10
     osd scrub begin hour = 2
     osd scrub end hour = 10
     osd scrub priority = 1
     osd scrub load threshold = 15
     osd snap trim priority = 5
     osd snap trim sleep = 0.1

[mds]
    mds data = /var/lib/ceph/mds/$cluster-$id
    keyring = /var/lib/ceph/mds/$cluster-$id/keyring

[mds.pve5]
    host = pve5

[mds.pve6]
    host = pve6

[mon.pve2]
     host = pve2
     mon addr = 10.10.1.16:6789

[mon.pve3]
     host = pve3
     mon addr = 10.10.1.21:6789

[mon.pve6]
     host = pve6
     mon addr = 10.10.1.36:6789
gman0 commented 5 years ago

@rootfs could you please have a look?

rootfs commented 5 years ago

@compilenix it looks setfattr is missing in your container?

gman0 commented 5 years ago

@rootfs well the provisioner doesn't need setfattr anyway so I don't think that's the issue here, rather that the plugin is complaining about Operation not supported when setting the attributes, even though the mount with ceph-fuse was successful.

rootfs commented 5 years ago

setfattr is called here in provisioner
https://github.com/ceph/ceph-csi/blob/master/pkg/cephfs/volume.go#L84

the error message is

E1111 20:14:02.125104       1 controllerserver.go:75] failed to create volume pvc-51f99106e5ee11e8: cephfs: setfattr failed with following error: exit status 1
cephfs: setfattr output: setfattr: /var/lib/kubelet/plugins/csi-cephfsplugin/controller/volumes/root-csi-cephfs-54a2ccde-e5ee-11e8-b5a7-c60474a907dc/csi-volumes/csi-cephfs-54a2ccde-e5ee-11e8-b5a7-c60474a907dc: Operation not supported

@compilenix what is your ceph release? @batrick any idea why setfattr failed with operation not supported on cephfs quota?

compilenix commented 5 years ago

I have Ceph on Debian 9 (12.2.8 -> luminous)

compilenix commented 5 years ago

Correct setfattr is not installed in the provisioner container / image.

I've installed it manually to see if it makes any difference, using apk add attr. But the behavior and logs doesn't look differnent.

gman0 commented 5 years ago

external-provisioner sidecar-container does not need these utilities

rootfs commented 5 years ago

@compilenix can you turn on mds logging and post the mds log?

gman0 commented 5 years ago

@compilenix and also please could you try mounting the volume manually with ceph-fuse -d in the plugin container?

After you get that error, check the logs to see the mount command and just copy-paste that along with the -d flag like so: logs:

I1111 20:41:04.651845       1 util.go:41] cephfs: EXEC ceph-fuse [/var/lib/kubelet/plugins/csi-cephfsplugin/controller/volumes/root-csi-cephfs-1bd3d4bc-e5f2-11e8-b5a7-c60474a907dc -c /etc/ceph/ceph.share.csi-cephfs-1bd3d4bc-e5f2-11e8-b5a7-c60474a907dc.conf -n client.admin --keyring /etc/ceph/ceph.share.csi-cephfs-1bd3d4bc-e5f2-11e8-b5a7-c60474a907dc.client.admin.keyring -r / -o nonempty]

so the command would be:

$ ceph-fuse -d /mnt -c /etc/ceph/ceph.share.csi-cephfs-1bd3d4bc-e5f2-11e8-b5a7-c60474a907dc.conf -n client.admin --keyring /etc/ceph/ceph.share.csi-cephfs-1bd3d4bc-e5f2-11e8-b5a7-c60474a907dc.client.admin.keyring -r / -o nonempty

and also try ls-ing the mount point to see if ceph-fuse is not exiting silently. Thanks!

compilenix commented 5 years ago

the file /etc/ceph/ceph.share.csi-cephfs-1bd3d4bc-e5f2-11e8-b5a7-c60474a907dc.conf did not exist anymore, but there are plenty similar named files so i used ceph.share.csi-cephfs-ffb93536-e5f8-11e8-91f0-c60474a907dc.conf.

the ceph-fuse -d seems to work:

# kubectl exec csi-cephfsplugin-grb26 -c csi-cephfsplugin -i -t -- sh -il
# ceph-fuse -d /mnt -c /etc/ceph/ceph.share.csi-cephfs-ffb93536-e5f8-11e8-91f0-c60474a907dc.conf -n client.admin --keyring /etc/ceph/ceph.share.csi-cephfs-ffb93536-e5f8-11e8-91f0-c60474a907dc.client.admin.keyring -r / -o nonempty
2018-11-15 07:20:11.520 7efde07f3c00  0 ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable), process ceph-fuse, pid 1200
2018-11-15 07:20:11.532 7efde07f3c00 -1 init, newargv = 0x28b5dc0 newargc=9
ceph-fuse[1200]: starting ceph client
ceph-fuse[1200]: starting fuse

[ process keeps running ]

the mds log level was set to 3 during this command:

2018-11-15 08:20:11.551125 7f6514d54700  0 -- 10.10.1.31:6824/554074929 >> 10.13.39.8:0/500610910 conn(0x55e7ebfbd800 :6824 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).handle_connect_msg: challenging authorizer
2018-11-15 08:20:11.552481 7f6512581700  3 mds.0.server handle_client_session client_session(request_open) v3 from client.174516258
2018-11-15 08:20:11.560212 7f6512581700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 1) v1 from client.174516258
2018-11-15 08:20:12.549042 7f6512581700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 2) v1 from client.174516258
2018-11-15 08:20:15.924434 7f650fd7c700  2 mds.0.cache check_memory_usage total 414680, rss 22280, heap 313916, baseline 313916, buffers 0, 1 / 108 inodes have caps, 1 caps, 0.00925926 caps per inode
2018-11-15 08:20:20.924715 7f650fd7c700  2 mds.0.cache check_memory_usage total 414680, rss 22280, heap 313916, baseline 313916, buffers 0, 1 / 108 inodes have caps, 1 caps, 0.00925926 caps per inode

using mount in the plugin container in a seperate shell i can see that the mount seems to have worked:

[ truncated ]
tmpfs on /var/lib/kubelet/pods/d3c0739a-e5f6-11e8-b38a-0ab53650f6d5/volumes/kubernetes.io~secret/csi-nodeplugin-token-5vkbc type tmpfs (rw,relatime)
tmpfs on /run/secrets/kubernetes.io/serviceaccount type tmpfs (ro,relatime)
/dev/sda1 on /var/lib/kubelet/plugins/csi-cephfsplugin type ext4 (rw,relatime,errors=remount-ro,data=ordered)
ceph-fuse on /mnt type fuse.ceph-fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)

ls -la /mnt shows a single directory named csi-volumes.

This directory has many subfolders like this:

# ls -la /mnt/csi-volumes
total 49
drwxr-x--- 97 root root 0 Nov 14 19:53 .
drwxr-xr-x  3 root root 0 Nov 11 19:48 ..
drwxr-x---  2 root root 0 Nov 11 21:16 csi-cephfs-02105771-e5f7-11e8-91f0-c60474a907dc
drwxr-x---  2 root root 0 Nov 11 20:33 csi-cephfs-06a97aa2-e5f1-11e8-b5a7-c60474a907dc
drwxr-x---  2 root root 0 Nov 14 19:39 csi-cephfs-09bab242-e845-11e8-91f0-c60474a907dc
[ truncated ]

a directory named csi-cephfs-ffb93536-e5f8-11e8-91f0-c60474a907dc did exist, too. i could create a simple text file within that directory (/mnt/csi-volumes/csi-cephfs-ffb93536-e5f8-11e8-91f0-c60474a907dc/test.txt)

the new file contributed to used space on the cephfs data pool (which was completly empty up until now; ceph df detail:

POOLS:
    NAME                QUOTA OBJECTS     QUOTA BYTES      USED     %USED     MAX AVAIL     OBJECTS     DIRTY     READ       WRITE       RAW USED
    cephfs              N/A               N/A                28         0          440G           1         1          0           1           56
    cephfs_metadata     N/A               N/A              613k         0          440G          21        21         94         617        1226k
rootfs commented 5 years ago

@compilenix can you setfattr on that directory?

setfattr -n ceph.quota.max_bytes -v 5000000 /mnt/csi-volumes/csi-cephfs-ffb93536-e5f8-11e8-91f0-c60474a907dc/

compilenix commented 5 years ago

No.

# setfattr -n ceph.quota.max_bytes -v 5000000 /mnt/csi-volumes/csi-cephfs-ffb93536-e5f8-11e8-91f0-c60474a907dc/
setfattr: /mnt/csi-volumes/csi-cephfs-ffb93536-e5f8-11e8-91f0-c60474a907dc/: Operation not supported

mds log at this time:

2018-11-15 15:20:11.430010 7f650fd7c700  2 mds.0.cache check_memory_usage total 420824, rss 29788, heap 313916, baseline 313916, buffers 0, 98 / 109 inodes have caps, 98 caps, 0.899083 caps per inode
2018-11-15 15:20:15.800946 7f6512581700  3 mds.0.server handle_client_session client_session(request_renewcaps seq 1262) v1 from client.174516258
2018-11-15 15:20:16.430177 7f650fd7c700  2 mds.0.cache check_memory_usage total 420824, rss 29788, heap 313916, baseline 313916, buffers 0, 98 / 109 inodes have caps, 98 caps, 0.899083 caps per inode
2018-11-15 15:20:21.429998 7f650fd7c700  2 mds.0.cache check_memory_usage total 420824, rss 29788, heap 313916, baseline 313916, buffers 0, 98 / 109 inodes have caps, 98 caps, 0.899083 caps per inode
2018-11-15 15:20:26.430372 7f650fd7c700  2 mds.0.cache check_memory_usage total 420824, rss 29788, heap 313916, baseline 313916, buffers 0, 98 / 109 inodes have caps, 98 caps, 0.899083 caps per inode
2018-11-15 15:20:31.430457 7f650fd7c700  2 mds.0.cache check_memory_usage total 420824, rss 29788, heap 313916, baseline 313916, buffers 0, 98 / 109 inodes have caps, 98 caps, 0.899083 caps per inode
rootfs commented 5 years ago

@jcsp @batrick can you take a look? thanks

batrick commented 5 years ago

So the problem only occurs with the kernel client? What version of the kernel is being used? Kernel quota management is not supported until Mimic and 4.17 kernel: http://docs.ceph.com/docs/master/cephfs/kernel-features/

compilenix commented 5 years ago

No, the problem occurs with the fuse client. I suspected this earlier, too.

The kernel is at version 4.15.0 (Ubuntu 18.04).

Is there a option not to define a quota, this would work for me just fine?

rootfs commented 5 years ago

@compilenix do you use same cephcsi plugin container at quay.io? I am not sure if the cephfs-fuse is up to date but I'll check.

A bit off topic, the attribute setting here and below appear only applicable to new kernel mounter or cephfs-fuse. I believe we need some if-else here: if it is a kernel mounter, we should avoid setting them since old kernel mounter will fail to mount later. @gman0

rootfs commented 5 years ago

let's see if #100 fixes this issue

rootfs commented 5 years ago

@compilenix I merged #100, please try the new cephfs plugin image

compilenix commented 5 years ago

@compilenix do you use same cephcsi plugin container at quay.io? I am not sure if the cephfs-fuse is up to date but I'll check.

I've used this image url: quay.io/cephcsi/cephfsplugin:v0.3.0

@compilenix I merged #100, please try the new cephfs plugin image

Sure, i've updated the yml files (see attached, i've excluded the rbac config) to include imagePullPolicy: Always.

csi-cephfsplugin.txt csi-cephfsplugin-attacher.txt csi-cephfsplugin-provisioner.txt

It does not seem to make a difference. here are the logs: logs-from-csi-cephfsplugin-attacher-in-csi-cephfsplugin-attacher-0.txt logs-from-csi-cephfsplugin-in-csi-cephfsplugin-mmrmz.txt logs-from-csi-provisioner-in-csi-cephfsplugin-provisioner-0.txt logs-from-driver-registrar-in-csi-cephfsplugin-mmrmz.txt

rootfs commented 5 years ago

not sure if this PR is related but can @ukernel check?

gman0 commented 5 years ago

@rootfs:

A bit off topic, the attribute setting here and below appear only applicable to new kernel mounter or cephfs-fuse. I believe we need some if-else here: if it is a kernel mounter, we should avoid setting them since old kernel mounter will fail to mount later. @gman0

This was not an issue before, the kernel client would just ignore the quota

gman0 commented 5 years ago

@compilenix @rootfs I tried to reproduce this issue with Ceph Luminous cluster (it's 12.2.4 but regardless) and it's indeed successfully failing with the aforementioned error message. There seems to be an incompatibility between Ceph Luminous cluster and Ceph Mimic FUSE driver when setting attributes. It's also worth noting that the kernel client does not exhibit this issue and works as expected.

compilenix commented 5 years ago

@gman0 which OS and kernel version did you use?

gman0 commented 5 years ago

@compilenix I've used ceph-container, tested on hosts:

$ uname -a
Linux ceph 4.9.0-7-amd64 #1 SMP Debian 4.9.110-3+deb9u1 (2018-08-03) x86_64 GNU/Linux

$ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/

and

$ uname -a
Linux ceph 4.18.0-2-amd64 #1 SMP Debian 4.18.10-2 (2018-11-02) x86_64 GNU/Linux

$ cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux buster/sid"
NAME="Debian GNU/Linux"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

with results in both cases as I've described in the previous message

Madhu-1 commented 5 years ago

@gman0 can we close this issue? or still, this issue is present?

jianglingxia commented 5 years ago

Hi all, I exec the command that my k8s minion or ceph minion that both failed,it means that my ceph cluster donot support the para? thanks very much

1/use kernel : mount -t ceph 178.178.178.189:1091,178.178.178.19:1091,178.178.178.188:1091:/ /home/jlx -o name=admin,secret=AQA0whldex5NJhAAnLkp5U9Iwh+69lz9zbMhMg==,mds_namespace=cephfs

mount error 22 = Invalid argument i found add mds_namespace para will fail

2/ use ceph-fuse [root@node-7:/home]$ ceph-fuse /home/jlx -m 178.178.178.189:1091,178.178.178.19:1091,178.178.178.188:1091 -c /etc/ceph/ceph.conf -n client.admin --key=AQA0w> 2019-07-05 16:33:12.116323 7f002195e040 -1 asok(0x562c5a7961c0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17) File exists 2019-07-05 16:33:12.117721 7f002195e040 -1 init, newargv = 0x562c5a7932d0 newargc=11 ceph-fuse[107904]: starting ceph client ceph-fuse[107904]: starting fuse [root@node-7:/home]$ setfattr -n ceph.quota.max_bytes -v 5368709120 /home/jlx setfattr: /home/jlx: Invalid parameters

Madhu-1 commented 5 years ago

cc @ajarr

jianglingxia commented 5 years ago

can anyone help me with the problem? I found that i exec the setfattr command in the csi-cephfsplugin container will print log Operation not supported but in my ceph cluster minion and k8s minion exec command can run correct, why ? thanks all!

setfattr -n ceph.quota.max_bytes -v 2073741824 csi-vol-097f0e23-a221-11e9-8c5a-fa163e58264b-creating

[root@node-7:/usr/bin]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6be1813bad0d e9251e1eaa69 "/usr/local/bin/cephc" 4 days ago Up 4 days k8s_csi-cephfsplugin_csi-cephfsplugin-provisioner-0_default_704d4169-9eeb-11e9-82a9-fa163e3dc6af_0

docker exec -it 6be1813bad0d /bin/sh sh-4.2# setfattr -n ceph.quota.max_bytes -v 2073741824 csi-vol-097f0e23-a221-11e9-8c5a-fa163e58264b-creating setfattr: csi-vol-097f0e23-a221-11e9-8c5a-fa163e58264b-creating: Operation not supported sh-4.2# ps -ef |grep ceph-fuse root 93071 0 0 08:04 pts/1 00:00:00 ceph-fuse /paasdata/docker/plugins/cephfs.csi.ceph.com/controller/volumes/root-csi-vol-fff4d95a-a11a-11e9-8c5a-fa163e58264b -m 178.178.178.19:1091,178.178.18.19:1091,178.18.178.188:1091 -c /etc/ceph/ceph.conf -n client.admin --key=AQA0whldex5NJhAAnLkp5U9Iwh+69lz9zbMhM== -r / -o nonempty --client_mds_namespace=cephfs

Madhu-1 commented 5 years ago

CC @poornimag can you help?

danielzhanghl commented 5 years ago

I made some private change to ignore error with the "setfattr“ operation, which is not support in my kernel version, and the volume create/mount is not impacted.

jianglingxia commented 5 years ago

yeah,today I also want to use the method you said,the problem is in csi driver the ceph-fuse version not compatible to my ceph cluster version? why the problem can not resolved and merge to the github?thanks all

the csi driver :the ceph-fuse version is sh-4.2# rpm -qa | grep ceph ceph-base-14.2.1-0.el7.x86_64 ceph-mgr-rook-14.2.1-0.el7.noarch ceph-osd-14.2.1-0.el7.x86_64 ceph-iscsi-config-2.6-2.6.el7.noarch ceph-common-14.2.1-0.el7.x86_64 ceph-mgr-dashboard-14.2.1-0.el7.noarch nfs-ganesha-ceph-2.7.3-0.1.el7.x86_64 ceph-fuse-14.2.1-0.el7.x86_64 ceph-radosgw-14.2.1-0.el7.x86_64

but my ceph cluster is : [root@node-7:/home/jlx]$ rpm -qa | grep ceph python-cephfs-12.2.2-10.el7.x86_64 ceph-common-12.2.2-10.el7.x86_64 libcephfs2-12.2.2-10.el7.x86_64 ceph-fuse-12.2.2-10.el7.x86_64

jianglingxia commented 5 years ago

the problem i have fixed that the ceph client version not compatible to the ceph version in csi driver container,i modify the csi driver container ceph client version v1.14 to v1.12,then the pv and volume can dynamic created and pod use pvc claim can running,thanks

to Madhu-1 : the pr can you merge the master branch? thanks

460