cockroachdb / helm-charts

Helm charts for cockroachdb
Apache License 2.0
82 stars 149 forks source link

Helm chart installation fails due to insufficient permissions for security context #357

Open himanshu-cockroach opened 9 months ago

himanshu-cockroach commented 9 months ago

PROBLEM:

Currently, helm chart installation fails with with the following values enabled:

  1. tls.enabled = true
  2. tls.certs.selfSigner.enabled = false
  3. tls.certs.certManager = true

With the following values, when we try to create an instance after successful operator installation, stateful set is created but no pods are scheduled. While trying to describe the stateful set, we get the following output.

kubectl describe sts cockroachdb-sample -n openshift-operators                                              
Name:               cockroachdb-sample
Namespace:          openshift-operators
CreationTimestamp:  Wed, 04 Oct 2023 01:42:03 +0530
Selector:           app.kubernetes.io/component=cockroachdb,app.kubernetes.io/instance=cockroachdb-sample,app.kubernetes.io/name=cockroachdb
Labels:             app.kubernetes.io/component=cockroachdb
                    app.kubernetes.io/instance=cockroachdb-sample
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=cockroachdb
                    helm.sh/chart=cockroachdb-11.2.1
Annotations:        meta.helm.sh/release-name: cockroachdb-sample
                    meta.helm.sh/release-namespace: openshift-operators
Replicas:           3 desired | 0 total
Update Strategy:    RollingUpdate
Pods Status:        0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/component=cockroachdb
                    app.kubernetes.io/instance=cockroachdb-sample
                    app.kubernetes.io/name=cockroachdb
  Service Account:  cockroachdb-sample
  Init Containers:
   copy-certs:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
      -c
      cp -f /certs/* /cockroach-certs/; chmod 0400 /cockroach-certs/*.key
    Environment:
      POD_NAMESPACE:   (v1:metadata.namespace)
    Mounts:
      /certs/ from certs-secret (rw)
      /cockroach-certs/ from certs (rw)
  Containers:
   db:
    Image:       cockroachdb/cockroach:v23.1.11
    Ports:       26257/TCP, 8080/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      shell
      -ecx
      exec /cockroach/cockroach start --join=${STATEFULSET_NAME}-0.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-1.${STATEFULSET_FQDN}:26257,${STATEFULSET_NAME}-2.${STATEFULSET_FQDN}:26257 --advertise-host=$(hostname).${STATEFULSET_FQDN} --certs-dir=/cockroach/cockroach-certs/ --http-port=8080 --port=26257 --cache=25% --max-sql-memory=25% --logtostderr=INFO
    Liveness:   http-get https://:http/health delay=30s timeout=1s period=5s #success=1 #failure=3
    Readiness:  http-get https://:http/health%3Fready=1 delay=10s timeout=1s period=5s #success=1 #failure=2
    Environment:
      STATEFULSET_NAME:   cockroachdb-sample
      STATEFULSET_FQDN:   cockroachdb-sample.openshift-operators.svc.cluster.local
      COCKROACH_CHANNEL:  kubernetes-helm
    Mounts:
      /cockroach/cockroach-certs/ from certs (rw)
      /cockroach/cockroach-data/ from datadir (rw)
  Volumes:
   datadir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  datadir
    ReadOnly:   false
   certs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
   certs-secret:
    Type:                       Projected (a volume that contains injected data from multiple sources)
    SecretName:                 cockroachdb-node
    SecretOptionalName:         <nil>
  Topology Spread Constraints:  topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector app.kubernetes.io/component=cockroachdb,app.kubernetes.io/instance=cockroachdb-sample,app.kubernetes.io/name=cockroachdb
Volume Claims:
  Name:          datadir
  StorageClass:  standard-csi
  Labels:        app.kubernetes.io/instance=cockroachdb-sample
                 app.kubernetes.io/name=cockroachdb
  Annotations:   <none>
  Capacity:      2Gi
  Access Modes:  [ReadWriteOnce]
Events:
  Type     Reason        Age                From                    Message
  ----     ------        ----               ----                    -------
  Warning  FailedCreate  1s (x13 over 22s)  statefulset-controller  create Pod cockroachdb-sample-0 in StatefulSet cockroachdb-sample failed error: pods "cockroachdb-sample-0" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.fsGroup: Invalid value: []int64{1000}: 1000 is not an allowed group, provider restricted-v2: .initContainers[0].runAsUser: Invalid value: 1000: must be in the ranges: [1000400000, 1000409999], provider restricted-v2: .containers[0].runAsUser: Invalid value: 1000: must be in the ranges: [1000400000, 1000409999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, pod.metadata.annotations[seccomp.security.alpha.kubernetes.io/pod]: Forbidden: seccomp may not be set, pod.metadata.annotations[container.seccomp.security.alpha.kubernetes.io/copy-certs]: Forbidden: seccomp may not be set, pod.metadata.annotations[container.seccomp.security.alpha.kubernetes.io/db]: Forbidden: seccomp may not be set, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

This is an issue because the user that is defined inside the securityContext for pod templates in stateful set for the helm chart does not have required permission for the workloads to be scheduled.

We cannot have any SCCs defined in CSV for the helm chart operator since SCC uses serviceAccount name which can be provided seperately by the user in helm values and operator gets created beforehand.

POSSIBLE SOLUTIONS:

  1. As a workaround, we can apply a SecurityContextConstraint. Now OCP already provides a number of SCCs with defined permissions but none of them except privileged one works for the service account which is not recommended as it is the most relaxed SCC and should be used only for cluster administration. (NOT RECOMMENDED)

  2. We can add define a custom SCC which will be a part of templates that will contain only the minimum permissions required to run all the cockroach related workloads and apply it conditionally if the installation is being done on an OCP cluster.

harshn08 commented 9 months ago

Just checking to see if the cert-manager was installed prior to the helm chart installation?: https://github.com/cockroachdb/helm-charts#installation-of-helm-chart-with-cert-manager

himanshu-cockroach commented 9 months ago

@harshn08 yes, everything as a prerequisite was installed.

harshn08 commented 9 months ago

@himanshu-cockroach Which OpenShift version did you start observing this issue from? And has this been observed in any other OpenShift versions than the one tested on?

himanshu-cockroach commented 9 months ago

@harshn08 The last time I tested this was on 4.13. However I don't think it has to do much with openshift cluster version. Although, one thing I'm almost certain about is that this issue most probably got introduced after this PR went in.