ceph / ceph-csi

CSI driver for Ceph
Apache License 2.0
1.27k stars 539 forks source link

create cephfs pvc with error 'Operation not permitted' #1818

Closed deadjoker closed 2 years ago

deadjoker commented 3 years ago

Describe the bug

I deploy ceph-csi in k8s and use cephfs to provide pvc. PVC created fail when I use a normal ceph user but succeed if I use admin ceph user.

Environment details

Steps to reproduce

Steps to reproduce the behavior:

  1. create ceph user ceph auth caps client.k8sfs mon 'allow r' mgr 'allow rw' mds 'allow rw' osd 'allow rw tag cephfs *=*'
  2. dowload yaml from https://github.com/ceph/ceph-csi/tree/release-v3.2/deploy/cephfs/kubernetes
  3. modify ceph information in csi-config-map.yaml
  4. add kms-config.yaml and create from it
    ---
    apiVersion: v1
    kind: ConfigMap
    data:
    config.json: |-
    {}
    metadata:
    name: ceph-csi-encryption-kms-config
  5. add secret.yaml and create from it

    ---
    apiVersion: v1
    kind: Secret
    metadata:
    name: csi-cephfs-secret
    namespace: ceph-csi
    stringData:
    # Required for statically provisioned volumes
    #userID: <plaintext ID>
    #userKey: <Ceph auth key corresponding to ID above>
    
    # Required for dynamically provisioned volumes
    adminID: k8sfs
    adminKey: AQDuM+xfXz0zNRAAnxeJaWdmR2J5I/QxMR9gLQ==
  6. add storage class

    ---
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
    name: csi-cephfs-sc
    provisioner: cephfs.csi.ceph.com
    parameters:
    # String representing a Ceph cluster to provision storage from.
    # Should be unique across all Ceph clusters in use for provisioning,
    # cannot be greater than 36 bytes in length, and should remain immutable for
    # the lifetime of the StorageClass in use.
    # Ensure to create an entry in the config map named ceph-csi-config, based on
    # csi-config-map-sample.yaml, to accompany the string chosen to
    # represent the Ceph cluster in clusterID below
    clusterID: d9693b9b-8988-44bb-8bf9-ccb2c2733eec
    
    # CephFS filesystem name into which the volume shall be created
    fsName: cephfs
    
    # (optional) Ceph pool into which volume data shall be stored
    # pool: cephfs_data
    
    # (optional) Comma separated string of Ceph-fuse mount options.
    # For eg:
    # fuseMountOptions: debug
    
    # (optional) Comma separated string of Cephfs kernel mount options.
    # Check man mount.ceph for mount options. For eg:
    # kernelMountOptions: readdir_max_bytes=1048576,norbytes
    
    # The secrets have to contain user and/or Ceph admin credentials.
    csi.storage.k8s.io/provisioner-secret-name: csi-cephfs-secret
    csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi
    csi.storage.k8s.io/controller-expand-secret-name: csi-cephfs-secret
    csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi
    csi.storage.k8s.io/node-stage-secret-name: csi-cephfs-secret
    csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi
    
    # (optional) The driver can use either ceph-fuse (fuse) or
    # ceph kernelclient (kernel).
    # If omitted, default volume mounter will be used - this is
    # determined by probing for ceph-fuse and mount.ceph
    mounter: kernel
    reclaimPolicy: Retain
    allowVolumeExpansion: true
    mountOptions:
    - debug
  7. create pvc
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: csi-cephfs-pvc
    spec:
    accessModes:
    - ReadWriteMany
    resources:
    requests:
      storage: 5Gi
    storageClassName: csi-cephfs-sc

Actual results

# kubectl get sc
NAME            PROVISIONER           RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
csi-cephfs-sc   cephfs.csi.ceph.com   Retain          Immediate           true                   7h1m

# kubectl get pvc
NAME             STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS    AGE
csi-cephfs-pvc   Pending                                      csi-cephfs-sc   57m

# kubectl get pv
No resources found

Expected behavior

PVC should be created successfully and bound to a PV.

Logs

If the issue is in PVC creation, deletion, cloning please attach complete logs of below containers.

Additional context

the ceph user 'k8sfs' caps:

client.k8sfs
    key: AQDuM+xfXz0zNRAAnxeJaWdmR2J5I/QxMR9gLQ==
    caps: [mds] allow rw
    caps: [mgr] allow rw
    caps: [mon] allow r
    caps: [osd] allow rw tag cephfs *=*

this user has ability to create subvolume and subvolumegroup as well.

# ceph --id k8sfs fs subvolume create cephfs test 
# ceph --id k8sfs fs subvolume ls cephfs
[
    {
        "name": "test"
    }
]

# ceph --id k8sfs fs subvolumegroup ls cephfs
[
    {
        "name": "_nogroup"
    }, 
    {
        "name": "csi"
    }
]

# ceph --id k8sfs fs subvolumegroup create cephfs testgroup
# ceph --id k8sfs fs subvolumegroup ls cephfs
[
    {
        "name": "_nogroup"
    }, 
    {
        "name": "csi"
    }, 
    {
        "name": "testgroup"
    }
]

# ceph --id k8sfs fs subvolume create cephfs testsubvolume csi
# ceph --id k8sfs fs subvolume ls cephfs csi
[
    {
        "name": "testsubvolume"
    }, 
    {
        "name": "csi-vol-eac5a168-4a70-11eb-b23a-8e1756c5ca33"
    }
]

the 'csi' subvolumegroup is created when I use admin keyring in ceph-csi.

sgissi commented 3 years ago

I came across the same issue. User for CephFS is able to create subvolumegroups and subvolumes but it fails on the provisioner. An user will full admin rights works without problems. I couldn't find where the call to RADOS is done to find out which permission is missing or which action causes the problem.

humblec commented 3 years ago

@deadjoker @sgissi these are the capabilities we require for the user in a ceph cluster for Ceph CSI to perform its actions https://github.com/ceph/ceph-csi/blob/master/docs/capabilities.md , even after giving these permissions if you still face issues, please revert!

deadjoker commented 3 years ago

@humblec I followed this docs and still get this error. See my step 1

humblec commented 3 years ago

Thanks @deadjoker for confirming the setup . @yati1998 are we missing any capabilities in the doc ?

Yuggupta27 commented 3 years ago

Hi @deadjoker , As per the steps mentioned by you, the user creation is done as per the node plugin capabilities, and the cephFS Provisioner capabilities seem to be missing. This might be the reason why you are unable to provision a volume via the cephfs-provisioner. Unlike rbd, cephfs has separate capability requirements for node plugin and provisioner as mentioned here. For solving the issue, you can try creating separate cephfs-plugin and cephfs-provisioner secrets. Feel free to reach out if the issue still persists :)

deadjoker commented 3 years ago

Hi @Yuggupta27 Here is the secrets in my cluster environment.

kubectl get secret -n ceph-csi
NAME                                 TYPE                                  DATA   AGE
cephfs-csi-nodeplugin-token-sx9v2    kubernetes.io/service-account-token   3      97d
cephfs-csi-provisioner-token-xxnrd   kubernetes.io/service-account-token   3      97d
csi-cephfs-secret                    Opaque                                2      97d
default-token-ccmsh                  kubernetes.io/service-account-token   3      105d

Should I use a new ceph id with capability of

"mon", "allow r",
"mgr", "allow rw",
"osd", "allow rw tag cephfs metadata=*"

and create a csi-cephfs-provisioner-secret for the provisioner?

alamsyahho commented 3 years ago

@deadjoker Did you manage to get the issue resolved? I ran into exactly a similar error as well and not sure yet how to resolve the issue?

deadjoker commented 3 years ago

@alamsyahho have not resolved this issue yet. I'm using admin account instead

alamsyahho commented 3 years ago

Understood. Probably i will have to use admin account for csi-cephfs as well then. Thanks for your reply

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

Raboo commented 2 years ago

This is still very valid. I have ceph-csi installed via rook, and using the rook scripts to create the ceph clients

client.csi-cephfs-node
    caps: [mds] allow rw
    caps: [mgr] allow rw
    caps: [mon] allow r
    caps: [osd] allow rw tag cephfs *=*
client.csi-cephfs-provisioner
    caps: [mgr] allow rw
    caps: [mon] allow r
    caps: [osd] allow rw tag cephfs metadata=*

trying to provision a cephfs subvolumegroup doesn't work using csi-cephfs-provisioner. However if I tell the storageclass to use admin, it works, so something is either missing from these caps or the code does something different when admin is used.

Update: the csi-cephfs-provisioner is able to create subvolume groups

[root@kw-02000cccea2b /]# ceph -n client.csi-cephfs-provisioner --key xxx== -m v2:10.3.60.25:3300 fs subvolumegroup create cephfs test cephfs_data
[root@kw-02000cccea2b /]# ceph -n client.csi-cephfs-provisioner --key xxx== -m v2:10.3.60.25:3300 fs subvolumegroup ls cephfs
[
    {
        "name": "test"
    }
]
Raboo commented 2 years ago

Weirdly enough this still fails if I give the csi-cephfs-provisioner client same caps as admin, but it works if I use the admin client.

[client.csi-cephfs-provisioner]
    caps mds = "allow *"
    caps mgr = "allow *"
    caps mon = "allow *"
    caps osd = "allow *"
github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

Raboo commented 2 years ago

I still wasn't able to solve the problem, I simply worked around it using client.admin like some other people here.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

sf98723 commented 2 years ago

@deadjoker, the ceph capabilities requirements you provided, from the following link have to be used in the userID section of the secret, for static provisioning only. The following example explains the meaning of the userID and adminID sections.

If you expect a dynamic provisioning behaviour, you have to provide an admin user account, for some -not well documented- reasons.

I've faced this issue in the past months -> only the client.admin user worked. When I created another admin user, say "client.admin123" with the same capabilities, it didn't work. A few posts are related to this pb -> this one for example

Last days, users at work asked us to provide dynamic provisioning for our K8S/Ceph environments.

So, I've tried this evening with an "up to date" config :

I've created again an alternative admin account with the same caps as client.admin... inserted these credentials at adminID : .... it works, now, with an alternative admin user !

Screenshot 2022-03-02 at 23 59 33

Here is the user definition and caps for information :

client.admink8s key: AQBB4............jKSb9Kbjg== caps: [mds] allow caps: [mgr] allow caps: [mon] allow caps: [osd] allow *

Very insecure... We do not want to expose an admin token in the clear in Kubernetes as we don't use protected secrets already. At least it would be appreciated not to require write capabilities for the monitors..

Can the development team clarify in the docs directory the minimal caps for an "admin" user for dynamic provisioning ? Or explain why it have to be a full admin having write caps for the Ceph mons.. ?

@humblec ? I will also check at the code and ceph detailed caps next days

Thanks a lot,

drummerglen commented 2 years ago

Hi guys,

I encountered this problem too, but I have been resolved. The key point was the adminID and adminKey in Secret file must be admin (client.admin in ceph cluster). Once I re-apply the Secret yaml file, csi-cephfs-sc works!!!

I found the doc in ceph-csi/docs/capabilities.md Seems there is some issues in user privilege which config by ceph ( use ceph auth client.xxx caps mon 'allow r' osd....mds...mgr... ). It dosen't work!!!

Here is the change

apiVersion: v1 kind: Secret metadata: name: csi-cephfs-secret namespace: ceph-csi stringData:

Required for statically provisioned volumes

userID: </h1> <h1>userKey: <Ceph auth key corresponding to ID above></h1> <h1>Required for dynamically provisioned volumes</h1> <p>adminID: k8sfs <code>&lt;-- here should be admin (client.admin)</code> adminKey: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx==</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/Raboo"><img src="https://avatars.githubusercontent.com/u/1148206?v=4" />Raboo</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <p>@drummerglen It's not resolved. So your &quot;solution&quot;/&quot;resolved&quot; is what exactly what everyone else did to work around the problem and nothing new. It's even written in the original post</p> <blockquote> <p>... but succeed if I use admin ceph user.</p> </blockquote> <p>It's not a solution/resolution to run as admin/superuser/god-mode, it's just a temporary work-around. Privilege separation is there for a reason, mainly to reduce the risk of malicious abuse or errors made by code or humans.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/drummerglen"><img src="https://avatars.githubusercontent.com/u/42017338?v=4" />drummerglen</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <p>@Raboo Oops, sorry I didn't read every comment. May I ask if any version has resolved this issue?</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/Raboo"><img src="https://avatars.githubusercontent.com/u/1148206?v=4" />Raboo</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <p>@drummerglen no I don't think so. It seems very hard to figure out why this is happening and probably doesn't affect the majority of the users.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/drummerglen"><img src="https://avatars.githubusercontent.com/u/42017338?v=4" />drummerglen</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <p>@Raboo My ceph cluster was deployed by cephadm running on docker. I have no idea if it is the problem.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/alfredogotamagmail"><img src="https://avatars.githubusercontent.com/u/105839529?v=4" />alfredogotamagmail</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>Hi, until today the issue has not been resolved yet. Is there any ongoing fixing that is still pending? Or nobody really cares about this issue? It is very concerning that we need to expose our Ceph superuser credentials into ceph-csi client, a slight human or backend error might jeopardize the whole Ceph cluster.</p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/alepiazza"><img src="https://avatars.githubusercontent.com/u/35802027?v=4" />alepiazza</a> commented <strong> 1 year ago</strong> </div> <div class="markdown-body"> <p>Hi, I am unsure if the issue is the same but you might want to look at <a href="https://github.com/ceph/ceph-csi/issues/2687">https://github.com/ceph/ceph-csi/issues/2687</a>. We faced similar issues in crafting the correct caps so to let the ceph provisioner use credentials with restricted access. Like avoiding <code>allow *</code> in all caps or restricting the permissions to path, fs, volumes. The caps suggested at the end of the above issue are working for us, but unfortunately, the docs has not been updated yet.</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>