Open dobharweim opened 2 years ago
I've gotten a workaround for this issue. I add a securityContext to run the initContainer as root and it seems to detect this and run the chown step.
New Manifest
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.5.0
nodeSets:
- name: default
count: 1
config:
node.store.allow_mmap: false
podTemplate:
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
runAsGroup: 0
initContainers:
- name: elastic-internal-init-filesystem
securityContext:
runAsUser: 0
runAsGroup: 0
EOF
The set-default-security-context
ECK parameter, which defaults to true
, is responsible for automatically adding fsGroup: 1000
to the elasticsearch pod's securityContext
, in order to make Kubernetes automatically change ownership on the data volume (see https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods)
Can you double check the value you are using?
Next, DelegateFSGroupToCSIDriver
is a K8s feature-gate which delegates the ownership change to the CSI driver. It was alpha / false
up to Kubernetes 1.22 and is now beta / true
since Kubernetes 1.23. You should validate your CSI driver doesn't have any known issue regarding this feature (some has from my personal experience). Using Kubernetes 1.23+, you can still force this feature-gate to false on the various Kubernetes components
Hi @jeanfabrice my apologies for the delay in responding, thank you for your input and direction.
ECK is installed with chart default settings as I should have outlined in the original issue.
DelegateFSGroupToCSIDriver
is enabled on my cluster, I don't know of any known issues with the feature with my provider. Do you have any info on the usual types of issues/what I could search for or try to reproduce in this area? Thanks.
Hey @dobharweim!
I would first check whether set-default-security-context
is enabled or not from an ECK perspective. If it is, your elasticsearch pods should normally have the securityContext.fsGroup: 1000
automatically configured.
To determine whether or not your CSI driver has an issue with DelegateFSGroupToCSIDriver
, you can certainly spin a busybox pod with securityContext.fsGroup: 1000
plus a mounted PVC, then see whether the PVC content is getting updated with group: 1000
ownership or not. If it is not, then the delegation is at fault. If it is, then it should work the same with elasticsearch pods.
I expected the elasticsearch pod to start with an initContainer elastic-internal-init-filesystem which would prepare the mounted PVC for the data directory (elasticsearch-data) with the correct ownership and octal permissions.
Setting permissions requires the init container to run as root
, which is not the case by default. As stated by the K8S documentation setting a fsGroup
in the Pod securityContext
should set the expected permissions without running a container with runAsGroup: 0
:
By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted.
For example:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
# uncomment the lines below to copy the specified node labels as pod annotations and use it as an environment variable in the Pods
#annotations:
# eck.k8s.elastic.co/downward-node-labels: "topology.kubernetes.io/zone"
name: elasticsearch-sample
spec:
version: 8.5.0
nodeSets:
- name: default
config:
node.store.allow_mmap: false
podTemplate:
spec:
securityContext:
runAsUser: 3000
runAsGroup: 0
fsGroup: 3000
The set-default-security-context ECK parameter, which defaults to true, is responsible for automatically adding fsGroup: 1000 to the elasticsearch pod's securityContext,
Good point, but I think the doc is not up-to-date and the default value is auto-detect
(detection mechanism here) since 2.5.0 (see https://github.com/elastic/cloud-on-k8s/pull/5150/files)
I have no idea how it behaves on IBM Kubernetes Service? Is it a "flavor" of OpenShift?
Exact same issue with a local minikube cluster using ECK version 2.11.1
, can be easily reproduced through a PersistentVolume
as follows:
pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: manual-pv-1
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/manual-pv-1"
elasticsearch,yaml
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 8.12.0
nodeSets:
- name: default
count: 1
podTemplate:
spec:
# Uncomment to fix the issue
#
# securityContext:
# fsGroup: 1000
# runAsUser: 1000
# runAsGroup: 0
# initContainers:
# - name: elastic-internal-init-filesystem
# securityContext:
# runAsUser: 0
# runAsGroup: 0
containers:
- name: elasticsearch
resources:
requests:
memory: 2Gi
cpu: 2
limits:
memory: 4Gi
cpu: 8
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: manual
Bug Report
What did you do?
I Installed the quickstart elasticsearch cluster from the docs to an namespace managed by Operator version 2.5.0.
What did you expect to see?
I expected the elasticsearch pod to start with an initContainer
elastic-internal-init-filesystem
which would prepare the mounted PVC for the data directory (elasticsearch-data) with the correct ownership and octal permissions.What did you see instead? Under which circumstances?
Instead the
elastic-internal-init-filesystem
container does not update the volume mount and therefore it is unwritable. ES fails with the following error (logs fromelastic-internal-init-filesystem
below):k logs quickstart-es-default-0
k logs quickstart-es-default-0 elastic-internal-init-filesystem
ECK version:
2.5.0
Kubernetes information:
IBM Kubernetes Service.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3", GitCommit:"434bfd82814af038ad94d62ebe59b133fcb50506", GitTreeState:"clean", BuildDate:"2022-10-12T10:47:25Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"darwin/arm64"} Kustomize Version: v4.5.7 Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.3+IKS", GitCommit:"16b9651762237ff35f832b596fde9dd428d8150d", GitTreeState:"clean", BuildDate:"2022-10-14T06:25:49Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}
Resource definition: