Closed gfschmidt closed 8 months ago
Hi @gfschmidt , Does these services are using storageClass directly and creating pvc during install by themselves? or You can create pvc and pass it to installer ?
Hi @deeghuge , these services only accept a storageClass as input and create the required PVCs on demand by themselves. As far as I know the user does not have the chance to create a PVC and pass it to the installer. The installer only accepts a "storageClass" as parameter on the command line and deploys all required components.
As context, the work-around below is related to failures the CPST team was seeing with DataStage.
Documenting workaround for datastage persistent storage
Check permissions of subdirectories in the pvc on the storage cluster
[root@stg-node0 fs1]# ls -al /ibm/fs1/pvc-55074487-34ca-4d71-868f-e0aee66ddf26/pvc-55074487-34ca-4d71-868f-e0aee66ddf26-data/is-en-conductor-0/EngineClients/
total 2
drwxr-xr-x. 3 10032 stgadmin 4096 Jun 2 15:08 .
drwxr-xr-x. 19 10032 stgadmin 4096 Jun 2 14:45 ..
drwxr-x--x. 3 root root 4096 Jun 2 15:08 db2_client
Check /home/dsadm is not persistent
Copy a file into the conductor pod
[root@arcx3650fxxnh ads]# oc cp colleges.csv -n zen-automated is-en-conductor-0:/home/dsadm
[root@arcx3650fxxnh ads]# oc exec -n zen-automated is-en-conductor-0 -- ls /home/dsadm
colleges.csv
ds_logs
imam_logs
Note that these files do not exist on the pvc
[root@stg-node0 fs1]# ls -al pvc-55074487-34ca-4d71-868f-e0aee66ddf26/pvc-55074487-34ca-4d71-868f-e0aee66ddf26-data/is-en-conductor-0/EngineClients/db2_client/dsadm/
total 1
drwxrwx--x. 2 root root 4096 Jun 2 15:08 .
drwxr-x--x. 3 root root 4096 Jun 2 15:08 ..
-rw-r--r--. 1 10032 stgadmin 0 Jun 2 15:08 .extractComplete
Restarting the pod removes the files
[root@arcx3650fxxnh ads]# oc exec -n zen-automated is-en-conductor-0 -- ls /home/dsadm
Workaround provided here
Edit the statefulset
[root@arcx3650fxxnh ads]# oc edit sts/is-en-conductor
Insert the following entry under volumeMounts
volumeMounts:
- mountPath: /home/dsadm
name: engine-dedicated-volume
subPath: is-en-conductor-0/EngineClients/db2_client/dsadm
Upon saving, the pod will restart
Copy the file back into the pod
[root@arcx3650fxxnh ads]# oc cp colleges.csv -n zen-automated is-en-conductor-0:/home/dsadm
[root@arcx3650fxxnh ads]# oc exec -n zen-automated is-en-conductor-0 -- ls /home/dsadm
colleges.csv
See the file show up on the storage node
[root@stg-node0 fs1]# ls -al pvc-55074487-34ca-4d71-868f-e0aee66ddf26/pvc-55074487-34ca-4d71-868f-e0aee66ddf26-data/is-en-conductor-0/EngineClients/db2_client/dsadm/
total 162
drwxrwx--x. 3 root root 4096 Jun 2 17:02 .
drwxr-x--x. 3 root root 4096 Jun 2 15:08 ..
-rw-r--r--. 1 10032 stgadmin 160691 Jun 2 17:02 colleges.csv
-rw-r--r--. 1 10032 stgadmin 0 Jun 2 15:08 .extractComplete
drwxrw----. 3 10032 stgadmin 4096 Jun 2 17:01 .pki
Restart the pod and check if files persisted
[root@arcx3650fxxnh ads]# oc delete pod is-en-conductor-0
pod "is-en-conductor-0" deleted
[root@arcx3650fxxnh ads]# oc exec -n zen-automated is-en-conductor-0 -- ls /home/dsadm
colleges.csv
imam_logs
Note that the directory is still owned by root:root
[root@stg-node0 fs1]# ls -al pvc-55074487-34ca-4d71-868f-e0aee66ddf26/pvc-55074487-34ca-4d71-868f-e0aee66ddf26-data/is-en-conductor-0/EngineClients
total 2
drwxr-xr-x. 3 10032 stgadmin 4096 Jun 2 15:08 .
drwxr-xr-x. 19 10032 stgadmin 4096 Jun 2 14:45 ..
drwxr-x--x. 3 root root 4096 Jun 2 15:08 db2_client
Edit: an extra step that should be completed if is-en-conductor-0 pod has been restarted after install but before the workaround has been applied, or if files in /home/dsadm are otherwise missing.
Files that should be in /home/dsadm
[root@arcx3650fxxnh ~]# oc exec -n zen-automated is-en-conductor-0 -- ls -al /home/dsadm
total 204
drwxrwx--x. 5 root root 4096 Jun 3 17:19 .
drwxr-xr-x. 1 root root 51 May 5 04:06 ..
-rw-------. 1 dsadm dstage 55 Jun 3 16:21 .bash_history
-rwxr-xr-x. 1 dsadm dstage 18 Aug 21 2019 .bash_logout
-rwxr-xr-x. 1 dsadm dstage 193 Aug 21 2019 .bash_profile
-rwxr-xr-x. 1 dsadm dstage 344 May 5 04:06 .bashrc
drwxr-xr-x. 2 dsadm dstage 4096 May 5 03:47 ds_logs
-rw-r--r--. 1 dsadm dstage 0 Jun 3 17:19 .extractComplete
drwxr-xr-x. 2 dsadm dstage 4096 Jun 3 17:22 imam_logs
drwxrw----. 3 dsadm dstage 4096 Jun 3 00:01 .pki
If any are missing, delete .extractComplete, then restart the pod
[root@arcx3650fxxnh ads]# oc exec -n zen-automated is-en-conductor-0 -- rm -f /home/dsadm/.extractComplete
We are Investigating alternative approach for subpath support
For RWO, fsGroup will help fix the subPath issue. For RWX volume, shared=true in storageClass is the option to fix the subPath issue.
Sidenote: This issue was related to subPath. Generally, for sharing access to existing data in IBM Spectrum Scale with static provisioning we would need to be careful with fsGroup not to generally apply the SGID bit on the mounted directory or even enforce recursively changing any existing file permissions. We would not generally want fsGroup SGID as default when sharing access to data in IBM Spectrum Scale where we carefully craft uid/gid and file permissions across users (POSIX and OCP/K8s users). It may be a selectable option though (not the default) to simulate the same behavior as with EmptyDir where fsGroup sets the SGID bit and applies the group ID to the mounted dir and all files/dirs within. This supports a simpler version of flat data sharing across across users in OpenShift without the effort of aligning uid/gids to POSIX users (outside of OCP/K8s).
Closing as fsgroup suppport and shared=true option is provided for resolving this issue. Also there is no other improvement planned for this issue. Please reopen if there is still missing functionality
Is your feature request related to a problem? Please describe. The IBM Watson Machine Learning Accelerator (WMLA) component for IBM Cloud Pak for Data (CP4D) seems to make excessive use of
subPath
to mount all the specified the sub-directories of the PV holding the user's data individually to selected mount points in the pod as follows:instead of mounting the whole PV to a single mount point in the pod with all its sub-directories:
The WMLA service of CP4D uses this concept to dynamically map the users' Jupyter notebook data into the pods using subPaths.
With CSI v2.1.0 it seems when using
subPath
it does not apply the correct uid/gid settings from the storage class so a non-root user loses access to these directories in the pod.From within the pod it looks as follows:
with
The directory in IBM Spectrum Scale backing the PVC (here light-weight provisioned from a storage class with uid: "1000750000" and gid: "1000750000") is:
and the uid/gid settings are not properly applied as specified the storage class:
Furthermore, these individual "mounts" via
subPath
also do NOT appear in thedf -h
output from within the pod at all:Another Scale PVC (wmla-logging) NOT using the
subPath
option appears in thedf -h
output as expected and also carries all the uid/gid settings from the storage class correctly:It looks like the
subPath
option is not fully supported with CSI v2.1.0 yet as the uid/gid settings from the storage class do not seem to be applied properly, especially when usingsubPath
with non-root user and group id.Describe the solution you'd like Full support of the
subPath
option with CSI also applying the correct uid/gid settings as specified in the storage class for non-root uid/gid.