Closed natalytvinova closed 8 months ago
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5373.
This message was autogenerated
This also happens on ReadOnlyMany volumes for notebooks
@natalytvinova does it happen for ReadWriteOnce PVCs/volumes for notebooks?
The problem above is that
The Notebook's pod has set .spec.securityContext = 100
which should ensure that the mounted PVCs are owned by the expected group. This is the case and how the RWO PVCs are read/write-able but seems to not be the case for RWX PVCs
From the K8s docs I see that .spec.securityContext
controls:
A special supplemental group that applies to all containers in a pod. Some volume types allow the Kubelet to change the ownership of that volume to be owned by the pod:
The owning GID will be the FSGroup 2. The setgid bit is set (new files created in the volume will be owned by FSGroup) 3. > The permission bits are OR'd with rw-rw----
If unset, the Kubelet will not modify the ownership and permissions of any volume. Note that this field cannot be set when spec.os.name is windows.
https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context
So most probably this has to do with the StorageClass and kubelet is unable to change the GID of the mounted RWX PVC. So we'll need to better understand the storage provider, which in this case I understand it's Cinder
Hi @kimwnasptd nope ReadWriteOnce volumes are okay
Yes, in this case it is Cinder I checked the configs for openstack-integrator, openstack-cloud-controller and cinder-csi charms and nothing seems related to this
Adding more context here after some exploration. Also thanks to @addyess for his help looking through the CSI code and issues.
First of all, Kubeflow Notebooks container .spec.securityContext = 100
in their PodSpec. This field tells kubernetes (kubelet) what GID and UID to use for the volume it mounts on the Pod
https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1
special supplemental group that applies to all containers in a pod. Some volume types allow the Kubelet to change the ownership of that volume to be owned by the pod:
The owning GID will be the FSGroup 2. The setgid bit is set (new files created in the volume will be owned by FSGroup) 3. > The permission bits are OR'd with rw-rw----
If unset, the Kubelet will not modify the ownership and permissions of any volume. Note that this field cannot be set when spec.os.name is windows.
The problem in this case is that upstream Cinder CSI driver does not support fsGroup
for RWX volumes https://github.com/kubernetes/cloud-provider-openstack/issues/2075, and that's why the RWX volume ends up being mounted as root/root.
Lastly, for reference, our Charms that create the cinder-csi-default
StorageClass is this one
https://github.com/canonical/cinder-csi-operator/blob/32c9361fcd3067c99ff4ba2a844d9dd12f2b7d36/src/storage_manifests.py#L106
So what happens in this case is:
cinder-csi-operator
, and creates the cinder-csi-default
storage classfsGroup
for RWX volumesfsGroup
cinder-csi-default
StorageClass don't change their permissionsSince this is more of a problem of the underlying storage infrastructure not respecting K8s constructs, which Kubeflow relies on, I'll go on and close the issue since there's not much we can do from Kubeflow side
Bug Description
While creating a Jupyter Notebook, I created a volume with the option "ReadWriteMany". The logs are in the end
The volume is created in the Kubernetes:
$ kubectl get pvc -A NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE admin uat-workspace Bound pvc-ae4e34b4-da07-4576-81b0-fd2a9c2a248a 20Gi RWX csi-cinder-default 13m $ kubectl get pv -A NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-ae4e34b4-da07-4576-81b0-fd2a9c2a248a 20Gi RWX Delete Bound admin/uat-workspace csi-cinder-default 13m
To Reproduce
Environment
Kubeflow bundle 1.8 Juju 3.1.7 Charmed Kubernetes 1.28 Kubernetes is on top of Openstack Yoga
Relevant Log Output
Additional Context
No response