IBM / ibm-spectrum-scale-csi

The IBM Spectrum Scale Container Storage Interface (CSI) project enables container orchestrators, such as Kubernetes and OpenShift, to manage the life-cycle of persistent storage.
Apache License 2.0
66 stars 49 forks source link

Full support of subPath option for storage class with non-root uid/gid #424

Closed gfschmidt closed 8 months ago

gfschmidt commented 3 years ago

Is your feature request related to a problem? Please describe. The IBM Watson Machine Learning Accelerator (WMLA) component for IBM Cloud Pak for Data (CP4D) seems to make excessive use of subPath to mount all the specified the sub-directories of the PV holding the user's data individually to selected mount points in the pod as follows:

    volumeMounts:
    - mountPath: /gpfs/mygpfs
      name: mygpfs
      subPath: mygpfs
    - mountPath: /gpfs/myresultfs
      name: mygpfs
      subPath: myresultfs
    - mountPath: /gpfs/mydatafs
      name: mygpfs
      subPath: mydatafs
    - mountPath: /gpfs/dlim
      name: mygpfs
      subPath: dlim

instead of mounting the whole PV to a single mount point in the pod with all its sub-directories:

    volumeMounts:
    - mountPath: /gpfs/mygpfs
      name: mygpfs

The WMLA service of CP4D uses this concept to dynamically map the users' Jupyter notebook data into the pods using subPaths.

With CSI v2.1.0 it seems when using subPath it does not apply the correct uid/gid settings from the storage class so a non-root user loses access to these directories in the pod.

From within the pod it looks as follows:

sh-4.2$ ls -al
total 25
drwxr-xr-x.   1 root       root          70 May  6 14:48 .
drwxr-xr-x.   1 root       root          70 May  6 14:48 ..
[...]
drwxr-xr-x.   6 root       root          66 Jan 21 01:05 gpfs            <<< subPath mounts without proper uid/gid
[...]
drwxrwx--x.   4 1000750000 1000750000  4096 May  6 03:32 wmla-logging        <<< regular mount with proper uid/gid

with

sh-4.2$ ls -al /gpfs/
total 2
drwxr-xr-x. 6 root       root   66 Jan 21 01:05 .
drwxr-xr-x. 1 root       root   70 May  6 14:48 ..
drwxrwx--x. 2 root       root 4096 May  6 03:31 dlim
drwxrwx--x. 2 root       root 4096 May  6 14:45 mydatafs
drwxrwx--x. 3 root       root 4096 May  6 15:19 mygpfs
drwxrwx--x. 7 root       root 4096 May  6 15:04 myresultfs

The directory in IBM Spectrum Scale backing the PVC (here light-weight provisioned from a storage class with uid: "1000750000" and gid: "1000750000") is:

# ls -al /gpfs/ess3000_1M/wmla/pvc-9c13e6ab-29c7-4e78-a360-0df6a0290749
total 3
drwxrwx--x.  6 1000750000 1000750000 4096 May  6 05:31 .
drwxrwxrwx. 15 root       root       4096 May  6 05:34 ..
drwxrwx--x.  2 root       root       4096 May  6 05:31 dlim
drwxrwx--x.  2 root       root       4096 May  6 05:31 mydatafs
drwxrwx--x.  2 root       root       4096 May  6 05:31 mygpfs
drwxrwx--x.  4 root       root       4096 May  6 05:34 myresultfs

and the uid/gid settings are not properly applied as specified the storage class:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: ibm-spectrum-scale-wmla-sc
provisioner: spectrumscale.csi.ibm.com
parameters:
    volBackendFs: "fs1"
    volDirBasePath: "wmla"
    uid: "1000750000"
    gid: "1000750000"
reclaimPolicy: Delete

Furthermore, these individual "mounts" via subPath also do NOT appear in the df -h output from within the pod at all:

sh-4.2$ df -h
Filesystem                            Size  Used Avail Use% Mounted on
overlay                               446G   42G  404G  10% /
tmpfs                                  64M     0   64M   0% /dev
tmpfs                                  63G     0   63G   0% /sys/fs/cgroup
tmpfs                                  63G     0   63G   0% /dev/shm
tmpfs                                  63G   16M   63G   1% /etc/passwd
fs1                                    15T 1017G   14T   7% /wmla-logging        <<< this is another Scale PVC mounted at /wmla-logging (no subPath used)
tmpfs                                  63G   12K   63G   1% /etc/etcd
/dev/mapper/coreos-luks-root-nocrypt  446G   42G  404G  10% /etc/hosts
tmpfs                                  63G  4.0K   63G   1% /var/tmp/conf
tmpfs                                  63G   28K   63G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                                  63G     0   63G   0% /proc/acpi
tmpfs                                  63G     0   63G   0% /proc/scsi
tmpfs                                  63G     0   63G   0% /sys/firmware

Another Scale PVC (wmla-logging) NOT using the subPath option appears in the df -h output as expected and also carries all the uid/gid settings from the storage class correctly:

sh-4.2$ ls -alR /wmla-logging/
/wmla-logging/:
total 2
drwxrwx--x. 4 1000750000 1000750000 4096 May  6 03:32 .
drwxr-xr-x. 1 root       root         70 May  6 03:31 ..
drwxr-xr-x. 2 1000750000 1000750000 4096 May  6 03:32 dli
drwxr-xr-x. 2 1000750000 1000750000 4096 May  6 03:32 notebook

It looks like the subPath option is not fully supported with CSI v2.1.0 yet as the uid/gid settings from the storage class do not seem to be applied properly, especially when using subPath with non-root user and group id.

Describe the solution you'd like Full support of the subPath option with CSI also applying the correct uid/gid settings as specified in the storage class for non-root uid/gid.

deeghuge commented 3 years ago

Hi @gfschmidt , Does these services are using storageClass directly and creating pvc during install by themselves? or You can create pvc and pass it to installer ?

gfschmidt commented 3 years ago

Hi @deeghuge , these services only accept a storageClass as input and create the required PVCs on demand by themselves. As far as I know the user does not have the chance to create a PVC and pass it to the installer. The installer only accepts a "storageClass" as parameter on the command line and deploys all required components.

martinep1 commented 3 years ago

As context, the work-around below is related to failures the CPST team was seeing with DataStage.

Documenting workaround for datastage persistent storage

Confirm issue exists on fresh install

Check permissions of subdirectories in the pvc on the storage cluster

[root@stg-node0 fs1]# ls -al /ibm/fs1/pvc-55074487-34ca-4d71-868f-e0aee66ddf26/pvc-55074487-34ca-4d71-868f-e0aee66ddf26-data/is-en-conductor-0/EngineClients/
total 2
drwxr-xr-x.  3 10032 stgadmin 4096 Jun  2 15:08 .
drwxr-xr-x. 19 10032 stgadmin 4096 Jun  2 14:45 ..
drwxr-x--x.  3 root  root     4096 Jun  2 15:08 db2_client

Check /home/dsadm is not persistent

Copy a file into the conductor pod

[root@arcx3650fxxnh ads]# oc cp colleges.csv -n zen-automated is-en-conductor-0:/home/dsadm
[root@arcx3650fxxnh ads]# oc exec -n zen-automated is-en-conductor-0 -- ls /home/dsadm
colleges.csv
ds_logs
imam_logs

Note that these files do not exist on the pvc

[root@stg-node0 fs1]# ls -al pvc-55074487-34ca-4d71-868f-e0aee66ddf26/pvc-55074487-34ca-4d71-868f-e0aee66ddf26-data/is-en-conductor-0/EngineClients/db2_client/dsadm/
total 1
drwxrwx--x. 2 root  root     4096 Jun  2 15:08 .
drwxr-x--x. 3 root  root     4096 Jun  2 15:08 ..
-rw-r--r--. 1 10032 stgadmin    0 Jun  2 15:08 .extractComplete

Restarting the pod removes the files

[root@arcx3650fxxnh ads]# oc exec -n zen-automated is-en-conductor-0 -- ls /home/dsadm

Apply workaround

Workaround provided here

Edit the statefulset

[root@arcx3650fxxnh ads]# oc edit sts/is-en-conductor

Insert the following entry under volumeMounts

    volumeMounts:
    - mountPath: /home/dsadm
      name: engine-dedicated-volume
      subPath: is-en-conductor-0/EngineClients/db2_client/dsadm

Upon saving, the pod will restart

Copy the file back into the pod

[root@arcx3650fxxnh ads]# oc cp colleges.csv -n zen-automated is-en-conductor-0:/home/dsadm
[root@arcx3650fxxnh ads]# oc exec -n zen-automated is-en-conductor-0 -- ls /home/dsadm
colleges.csv

See the file show up on the storage node

[root@stg-node0 fs1]# ls -al pvc-55074487-34ca-4d71-868f-e0aee66ddf26/pvc-55074487-34ca-4d71-868f-e0aee66ddf26-data/is-en-conductor-0/EngineClients/db2_client/dsadm/
total 162
drwxrwx--x. 3 root  root       4096 Jun  2 17:02 .
drwxr-x--x. 3 root  root       4096 Jun  2 15:08 ..
-rw-r--r--. 1 10032 stgadmin 160691 Jun  2 17:02 colleges.csv
-rw-r--r--. 1 10032 stgadmin      0 Jun  2 15:08 .extractComplete
drwxrw----. 3 10032 stgadmin   4096 Jun  2 17:01 .pki

Restart the pod and check if files persisted

[root@arcx3650fxxnh ads]# oc delete pod is-en-conductor-0
pod "is-en-conductor-0" deleted
[root@arcx3650fxxnh ads]# oc exec -n zen-automated is-en-conductor-0 -- ls /home/dsadm
colleges.csv
imam_logs

Note that the directory is still owned by root:root

[root@stg-node0 fs1]# ls -al pvc-55074487-34ca-4d71-868f-e0aee66ddf26/pvc-55074487-34ca-4d71-868f-e0aee66ddf26-data/is-en-conductor-0/EngineClients
total 2
drwxr-xr-x.  3 10032 stgadmin 4096 Jun  2 15:08 .
drwxr-xr-x. 19 10032 stgadmin 4096 Jun  2 14:45 ..
drwxr-x--x.  3 root  root     4096 Jun  2 15:08 db2_client

Ensure /home/dsadm files exist

Edit: an extra step that should be completed if is-en-conductor-0 pod has been restarted after install but before the workaround has been applied, or if files in /home/dsadm are otherwise missing.

Files that should be in /home/dsadm

[root@arcx3650fxxnh ~]# oc exec -n zen-automated is-en-conductor-0 -- ls -al /home/dsadm
total 204
drwxrwx--x. 5 root  root     4096 Jun  3 17:19 .
drwxr-xr-x. 1 root  root       51 May  5 04:06 ..
-rw-------. 1 dsadm dstage     55 Jun  3 16:21 .bash_history
-rwxr-xr-x. 1 dsadm dstage     18 Aug 21  2019 .bash_logout
-rwxr-xr-x. 1 dsadm dstage    193 Aug 21  2019 .bash_profile
-rwxr-xr-x. 1 dsadm dstage    344 May  5 04:06 .bashrc
drwxr-xr-x. 2 dsadm dstage   4096 May  5 03:47 ds_logs
-rw-r--r--. 1 dsadm dstage      0 Jun  3 17:19 .extractComplete
drwxr-xr-x. 2 dsadm dstage   4096 Jun  3 17:22 imam_logs
drwxrw----. 3 dsadm dstage   4096 Jun  3 00:01 .pki

If any are missing, delete .extractComplete, then restart the pod

[root@arcx3650fxxnh ads]# oc exec -n zen-automated is-en-conductor-0 -- rm -f /home/dsadm/.extractComplete
deeghuge commented 3 years ago

We are Investigating alternative approach for subpath support

deeghuge commented 2 years ago

For RWO, fsGroup will help fix the subPath issue. For RWX volume, shared=true in storageClass is the option to fix the subPath issue.

gfschmidt commented 2 years ago

Sidenote: This issue was related to subPath. Generally, for sharing access to existing data in IBM Spectrum Scale with static provisioning we would need to be careful with fsGroup not to generally apply the SGID bit on the mounted directory or even enforce recursively changing any existing file permissions. We would not generally want fsGroup SGID as default when sharing access to data in IBM Spectrum Scale where we carefully craft uid/gid and file permissions across users (POSIX and OCP/K8s users). It may be a selectable option though (not the default) to simulate the same behavior as with EmptyDir where fsGroup sets the SGID bit and applies the group ID to the mounted dir and all files/dirs within. This supports a simpler version of flat data sharing across across users in OpenShift without the effort of aligning uid/gids to POSIX users (outside of OCP/K8s).

deeghuge commented 8 months ago

Closing as fsgroup suppport and shared=true option is provided for resolving this issue. Also there is no other improvement planned for this issue. Please reopen if there is still missing functionality