NetApp / trident

Storage orchestrator for containers
Apache License 2.0
762 stars 222 forks source link

v23.01.1: Error: container has runAsNonRoot and image will run as root #817

Closed temirg closed 1 year ago

temirg commented 1 year ago

Describe the bug Upgrade of trident in Rancher UI via helmchart from 22.10.0 to 23.01.1 failed

Environment Rancher 2.7.1, RKE2 v1.24.9+rke2r2

To Reproduce Steps to reproduce the behavior:

Expected behavior All trident pods run without errors

Additional context => Error: Events: Type Reason Age From Message " ---- ------ ---- ---- -------" Normal Scheduled 37s default-scheduler Successfully assigned trident/trident-controller-89d5d9c5f-pn7tr to worker-xxx Normal Pulling 37s kubelet Pulling image "docker.io/netapp/trident:23.01.1" Normal Pulled 35s kubelet Successfully pulled image "docker.io/netapp/trident:23.01.1" in 1.634133777s Normal Pulling 35s kubelet Pulling image "docker.io/netapp/trident-autosupport:23.01" Normal Pulling 34s kubelet Pulling image "registry.k8s.io/sig-storage/csi-provisioner:v3.4.0" Normal Pulled 34s kubelet Successfully pulled image "docker.io/netapp/trident-autosupport:23.01" in 1.048826948s Normal Pulling 33s kubelet Pulling image "registry.k8s.io/sig-storage/csi-attacher:v4.1.0" Normal Pulled 33s kubelet Successfully pulled image "registry.k8s.io/sig-storage/csi-provisioner:v3.4.0" in 759.625294ms Normal Pulled 32s kubelet Successfully pulled image "registry.k8s.io/sig-storage/csi-attacher:v4.1.0" in 881.271892ms Normal Pulling 32s kubelet Pulling image "registry.k8s.io/sig-storage/csi-resizer:v1.7.0" Warning Failed 32s kubelet Error: container has runAsNonRoot and image will run as root (pod: "trident-controller-89d5d9c5f-pn7tr_trident(6e9ac407-9040-4af6-ae1e-eea99b92d08e)", container: csi-attacher) Warning Failed 31s kubelet Error: container has runAsNonRoot and image will run as root (pod: "trident-controller-89d5d9c5f-pn7tr_trident(6e9ac407-9040-4af6-ae1e-eea99b92d08e)", container: csi-resizer) Normal Pulled 31s kubelet Successfully pulled image "registry.k8s.io/sig-storage/csi-resizer:v1.7.0" in 819.988585ms Normal Pulling 31s kubelet Pulling image "registry.k8s.io/sig-storage/csi-snapshotter:v6.2.1" Normal Pulled 31s kubelet Successfully pulled image "registry.k8s.io/sig-storage/csi-snapshotter:v6.2.1" in 812.559916ms Warning Failed 31s kubelet Error: container has runAsNonRoot and image will run as root (pod: "trident-controller-89d5d9c5f-pn7tr_trident(6e9ac407-9040-4af6-ae1e-eea99b92d08e)", container: csi-snapshotter) Warning Failed 30s (x2 over 35s) kubelet Error: container has runAsNonRoot and image will run as root (pod: "trident-controller-89d5d9c5f-pn7tr_trident(6e9ac407-9040-4af6-ae1e-eea99b92d08e)", container: trident-main) Warning Failed 30s (x2 over 33s) kubelet Error: container has runAsNonRoot and image will run as root (pod: "trident-controller-89d5d9c5f-pn7tr_trident(6e9ac407-9040-4af6-ae1e-eea99b92d08e)", container: csi-provisioner) Warning Failed 30s (x2 over 34s) kubelet Error: container has runAsNonRoot and image will run as root (pod: "trident-controller-89d5d9c5f-pn7tr_trident(6e9ac407-9040-4af6-ae1e-eea99b92d08e)", container: trident-autosupport) Normal Pulled 30s kubelet Container image "docker.io/netapp/trident:23.01.1" already present on machine Normal Pulled 30s kubelet Container image "docker.io/netapp/trident-autosupport:23.01" already present on machine Normal Pulled 30s kubelet Container image "registry.k8s.io/sig-storage/csi-provisioner:v3.4.0" already present on machine Normal Pulled 30s kubelet (combined from similar events): Container image "registry.k8s.io/sig-storage/csi-attacher:v4.1.0" already present on machine

temirg commented 1 year ago

Hello,

attached is a diff/compare between v22.10.0 and v23.01.1 helmcharts: charts-diff_v22.10.0_vs_v23.01.1.txt

v23.01.1 - roles/rolebindings added before clusteroles/clusterrolebindings.

Regards, temir.

temirg commented 1 year ago

Possibly very important info: all rke2 cluster with rke2 (k8s) <1.25.x have "cis-profile: 1.6" configured.

In the future, when rke2 is updated to version 1.25.x and newer, "cis-profile: 1.23" will be used.

Regards, temir.

gnarl commented 1 year ago

Hi @temirg,

We haven't tested Trident against RKE 2 and it seems that you are the first customer reporting an issue while using RKE 2. Please open a NetApp Support case for this issue so that additional information can be collected. This will help us to address this issue faster than if we try to just collect the information via GitHub Issues.

temirg commented 1 year ago

Hello @gnarl,

Netapp Case with prio 2 is closed today, because the bugfix wills take a while. Should the Issue be closed too?

gnarl commented 1 year ago

Hi @temirg,

I found out about the NetApp support case a bit ago. We are working on reopening the support case.

antwynne commented 1 year ago

The case has now been re-opened

antwynne commented 1 year ago

Hi @gnarl

Do you need any more information about this issue. Please let me know if you do.

mdekoster commented 1 year ago

Hi,

We have the same issue with the Trident Operator (on k8s version 1.24). The Operator creates the deployment for the trident-controller, but it does not specify the securityContext on pod level and container level in the deployment. If an another PSP is specified in the cluster with a alphabetic name for the ClusterRoleBinding that is before the ClusterRoleBinding fleet-controller with least permissions, the mutating admission psp controller will add securityContexts that do not conform the persmissons the pod/container(s) will need.

antwynne commented 1 year ago

@mdekoster we have the output of kubectl describe ns trident: @test:~> kubectl describe ns trident Name: trident Labels: kubernetes.io/metadata.name=trident

pod-security.kubernetes.io/enforce=privileged Annotations: cattle.io/status:

{"Conditions":[{"Type":"ResourceQuotaInit","Status":"True","Message":"","LastUpdateTime":"2022-01-28T11:03:35Z"},{"Type":"InitialRol esPopu... lifecycle.cattle.io/create.namespace-auth: true Status: Active

Resource Quotas Name: trident-csi Resource Used Hard


No LimitRange resource.

mdekoster commented 1 year ago

@antwynne that's for the Pod Security Admission (PSA) in k8s 1.25+. We are still on 1.24 and use the (deprecated) Pod Security Policy feature.

But, you still have to specify the needed securityContext in the pod specs of the deployment, daemonset or statefulset. The triden-operator does not specify the securityContext in the trident-controller deployment.

temirg commented 1 year ago

@antwynne / @gnarl there is a new trident release available: v23.04: https://github.com/NetApp/trident/releases/tag/v23.04.0 Does the new release solve the problem of version 23.01 with RKE2?

Regards, temirg.

zlmitchell commented 1 year ago

I just attempted the same with 23.04 and still have the same issue. RKE2 1.24.12+rke2r1, cis 1.6

Error: container has runAsNonRoot and image will run as root

zlmitchell commented 1 year ago

It appears that the Trident-Controller does not have a securityContext set, and is not configurable via the helm chart. Updating the deployment for the trident-controller and adding follow got it running. But as the chart/operator has no way to configure this we should probably get this bug report submitted.

securityContext:
  runAsNonRoot: false
antwynne commented 1 year ago

@zlmitchell In helm chart values exists a line with "deploymentAnnotations: {}" .

There are two deployments in total: trident-csi and trident-operator. What is the correct syntax to pass the rigth securityContext to the trident-csi?

== As a sample == deploymentAnnotations: metadata: name: trident-csi spec: spec: securityContext: runAsNonRoot: false

temirg commented 1 year ago

@zlmitchell The workaround works, but: is it safe for production? Or does the orchestrator eventually realize that the deployment has been changed and reset it again?

zlmitchell commented 1 year ago

If you are running PSA in your RKE2 cluster you will also need to add the following to the trident-operator ClusterRole

  - apiGroups:
      - management.cattle.io
    resources:
      - projects
    verbs:
      - updatepsa
temirg commented 1 year ago

Hello all,

the workaround with deployment edit works fine. Issue wills be closed as solved by workaround.

Many thanks to all involved!

Best Regards, temirg.

gnarl commented 1 year ago

This issue is also addressed with commit 6d30a16 and will be included in the Trident 23.07 release.