IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

Enhance storage class to support uid/gid with extended chmod settings and/or add full support for fsGroup #401

Closed gfschmidt closed 1 year ago

gfschmidt commented 3 years ago

Is your feature request related to a problem? Please describe.

Large application deployments in OpenShift, like, for example, IBM Cloud Pak for Data, only allow the admin to specify one storage class for the deployment of hundreds of pods. IBM Spectrum Scale CSI storage groups only provide one set of uid/gid attributes as an additional option to set the permissions of the directory backing the persistent volume bound to the persistent volume claim issued by an OpenShift user. This reflects the needs of a file system admin to assign file permissions and uid/gid settings to a directory in IBM Spectrum Scale. However, it does not properly reflect the needs of an OpenShift admin who creates a general purpose "storage class" for multiple OpenShift users who are not even associated with a specific uid or necessarily a specific gid. The "storage class" is created by an admin to allow dynamic provisioning of persistent volumes for multiple OpenShift users on demand. However, a "user" in OpenShift and the applications and pods run by this user are not typically associated with a fixed uid or gid. The uid and gid settings within a pod are determined by the Security Context Constraints (SCC) associated with the OpenShift user or specific pod security context settings applied in the pod's YAML manifest of an application. In OpenShift, for example, a regular user runs under the "restricted" SCC and the uid of a pod is arbitrarily assigned from a predefined range (e.g. uid=1xxxxxxxxx = MustRunAsRange) while the gid defaults to root (0) if nothing else is defined. The latter (gid=0) typically grants access to dynamically provisioned volumes backed by directories in IBM Spectrum Scale which were created with the default uid/gid=0 (root) storage class settings because of a matching gid=0 and despite of an arbitrary assigned non-root uid. However, multiple users may run applications using dynamically provisioned volumes from the same storage class where each application may be built from hundreds of pods with some of them running under arbitrarily chosen non-root uid/gid combinations as defined in the pod's security context. This may easily break dynamic provisioning through a common IBM Spectrum Scale storage class created by the admin for multiple users with default uid/gid settings and lead to pods with bound persistent volumes and no read/write access due to the non-root uid/gid attributes associated with the execution of a pod. Defining a storage class with one specific uid/gid settings will not help the OpenShift admin here as OpenShift users are not typically associated with a specific file system uid/gid and may even chose further arbitrary uid/gid settings for selected pods through their pod security settings in their YAML manifests.

For example, in Cloud Pak for Data v3.5.2 the "iis" sub-assembly as only one sub-component among many others sets an arbitrarily chosen pod security context of runAsUser=10032 and in the process acquires a gid=1000. This setting leads to an access loss of a dynamically provisioned volume from a Spectrum Scale storage class with uid/gid=0 for this specific pod while all other pods of the assembly run fine. Creating a specific storage class with uid=10032 can serve as a work-around under these circumstances as the "iis" sub-component gains access through the specific uid while all other pods of the Cloud Pak for Data deployment gain access through the common gid=0 (with uid=1xxxxxxxxx). However, if there would be just one another pod running in such a large deployment with yet another arbitrarily chosen non-root uid and gid then we could no longer define one common storage class for the whole deployment as this would require to define at least two different non-root uids or non-root gids in such a storage class which is not covered by the current single uid/gid option.

Describe the solution you'd like uid/gid are terms describing the security needs of a file system administrator for a directory in IBM Spectrum Scale backing a persistent volume in OpenShift. However, the uid/gid settings in an IBM Spectrum Scale CSI driver "storage class" that is created by an OpenShift administrator do not properly reflect the needs of an admin and the users in OpenShift as an admin creates a storage class for many different users and each user is not even associated with a fixed uid/gid. So under given circumstances users may request persistent volumes from an available IBM Spectrum Scale storage class through persistent volume claims and end up not having sufficient file permissions to read and write to that volume. In order to allow an OpenShift administrator broader control over file system permissions with storage classes we propose to consider support of an additional chmod option in the storage class so that access permissions to the directory backing the PV can be set more granular (set rwx permissions specific for uid and gid) or even broader (chmod o+rwx) to ensure access to every uid/gid combination associated with a pod under all circumstances (i.e. to ensure that a pod requesting a persistent volume from the storage class has indeed full read/write access to the persistent volume bound to the persistent volume claim). This would enable admins to allow dynamic provisioning with one storage class also for pods who enforce different uids/gids via the pod's security context.

Another option to consider with regard to file system permissions in a more Kubernetes-like manner for the CSI driver is to support the fsGroup option instead (see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/) which allows to define a supplementary group so that Kubernetes/OpenShift may use fsGroup to change permissions and ownership of the volume to match user requested fsGroup in the pod's SecurityPolicy, e.g. as in

securityContext:
  runAsUser: 1000
  runAsGroup: 3000
  fsGroup: 2000
  fsGroupChangePolicy: "OnRootMismatch"

For further reference, please see https://kubernetes-csi.github.io/docs/support-fsgroup.html and https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/695-skip-permission-change .

gfschmidt commented 3 years ago

Please also refer to Configure volume permission and ownership change policy for Pods

matt-levan commented 3 years ago

Additionally when using fsGroup, the setGID bit is set on the directory so that ownership is passed throughout the directory as new files are created.

A special supplemental group that applies to all containers in a pod. Some volume types allow the Kubelet to change the ownership of that volume to be owned by the pod: 1. The owning GID will be the FSGroup 2. The setgid bit is set (new files created in the volume will be owned by FSGroup) 3. The permission bits are OR'd with rw-rw---- If unset, the Kubelet will not modify the ownership and permissions of any volume.

Reference: https://github.com/kubernetes-client/go/blob/master/kubernetes/docs/V1PodSecurityContext.md

dunnevan commented 3 years ago

Another good link to keep in mind with this, CSI FSGroup support

msfrucht commented 3 years ago

I'm seeing this same issue. Scale CSI PVCs are always mounted to a mount point with gid 1000 and uid 1000 in Openshift in violation of SCC "restricted" rules. securityContext.fsGroup field is also ignored.

Any non-root pod that uses a Scale CSI PVC needs to have the field securityContext.supplementalGroups: [1000] or the container will fail to read or write to the PVC.

hseipp commented 3 years ago

@Jainbrt Please have a look at this topic, it circles back to the topic you raised to the AoT Storage team, doesn't it?

Jainbrt commented 3 years ago

Sure @hseipp , we are around the corner for CSI 2.2.0 release. We will discuss within the team once this snapshot support is out. Does it sound ok?

@Jainbrt Please have a look at this topic, it circles back to the topic you raised to the AoT Storage team, doesn't it?

gfschmidt commented 3 years ago

This issue is becoming more and more important as more and more IBM Cloud Pak for Data (CP4D) assemblies surface which enforce specific uid/gid settings and require different IBM Spectrum Scale CSI storage classes with specific uid/gid settings.

This is not only related to the following IBM Cloud Pak for Data services as

but also for

For now, we can simply define a specific IBM Spectrum Scale storage class with a specific uid for each component that enforces such a specific uid/gid pod security context but still use the same backend IBM Spectrum Scale file system for all of them.

The Spectrum Scale storage class, here for independent filesets, offers the following parameters to apply specific uid/gid settings:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ibm-spectrum-scale3-sc
provisioner: spectrumscale.csi.ibm.com
parameters:
  volBackendFs: "fs0"
  clusterId: "215057217487177715"
  uid: "1000750000"
  gid: "1000750000"
reclaimPolicy: Delete

Typically it will suffice to only define the uid: "<here_assembly_enforced_UID>" in the storage class and not to define the gid which will default to 0 (gid=0/root). This will allow pods which enforce the specific uid to have full access to their PVs via the uid while all other pods in the assembly which may run as regular users (under the restricted SCC) with an arbitrary uid and gid=0 as automatically assigned by OpenShift do maintain full access to their PVs via the gid=0.

Specifically, for IBM Cloud Pak for Data these storage classes could even be predefined for all CP4D assemblies in the CP4D documentation similar to Creating Portworx storage classes.

deeghuge commented 2 years ago

@gfschmidt CSI 2.6.0 will now support fsGroup for readWriteOnce volume. Please let us know any requirement from issue is still missing.

Jainbrt commented 1 year ago

Closing as this is implemented and released in CSI 2.6.0