aquasecurity / trivy

Find vulnerabilities, misconfigurations, secrets, SBOM in containers, Kubernetes, code repositories, clouds and more
https://aquasecurity.github.io/trivy
Apache License 2.0
23.34k stars 2.31k forks source link

Lots of issues with Trivy helm chart misconfiguration scanning #5679

Open chen-keinan opened 10 months ago

chen-keinan commented 10 months ago

Discussed in https://github.com/aquasecurity/trivy/discussions/5174

Originally posted by **Mo0rBy** September 13, 2023 ### Description I have found that there are a lot of false positives related to `securityContext` definitions in vanilla Kubernetes objects as well as custom objects (all Istio objects in my case). Most are just completely incorrect, but the alerts for the deployment manifests are quite interesting. Vanilla Kubernetes Objects --- (In my case, deployments, services and service-accounts) **service.yaml** `LOW: service myService in default namespace should set spec.securityContext.runAsGroup, spec.securityContext.supplementalGroups[*] and spec.securityContext.fsGroup to integer greater than 0 > See https://avd.aquasec.com/misconfig/ksv116` A service object cannot have a `securityContext` definition, so this should not be an alert. **service-account.yaml** `LOW: serviceaccount myService in default namespace should set spec.securityContext.runAsGroup, spec.securityContext.supplementalGroups[*] and spec.securityContext.fsGroup to integer greater than 0 > See https://avd.aquasec.com/misconfig/ksv116` A service-account object cannot have a `securityContext` definition, so this should not be an alert. **deployment.yaml** `LOW: deployment myService in default namespace should set spec.securityContext.runAsGroup, spec.securityContext.supplementalGroups[*] and spec.securityContext.fsGroup to integer greater than 0 > See https://avd.aquasec.com/misconfig/ksv116` Within my `spec.securityContext` definition, I have `runAsUser`, `runAsGroup` and `fsGroup` all defined. In the examples given on "[set the security standard for a pod](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod)" docs, the `supplementalGroups` field is not defined. I did some digging and found that this field would probably only be needed in specific use cases (see the [kubernetes reference](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.26/#podsecuritycontext-v1-core) docs). I'm not entirely sure of this is correct and I may be misunderstanding what this field is used for, but if my theory is correct, then this alert should not be present, as I am setting the other fields, which should be enough. `MEDIUM: Container 'myService' of Deployment 'myService' should not set 'securityContext.capabilities.add' > See https://avd.aquasec.com/misconfig/ksv022` This alert is the most interesting and nuanced. Reading the [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/), there are 3 categories, `Privileged`, `Baseline` and `Restricted`. In my case, I am following the `Restricted` standards, but this alert seems to be coming from the `Baseline` standards. Here are the relevant sections of the Pod Security Standards documentation: Here are the `Baseline` standards for `securityContext.capabilities`: --- Adding additional capabilities beyond those listed below must be disallowed. **Restricted Fields** - `spec.containers[*].securityContext.capabilities.add` - `spec.initContainers[*].securityContext.capabilities.add` - `spec.ephemeralContainers[*].securityContext.capabilities.add` **Allowed Values** - Undefined/nil - `AUDIT_WRITE` - `CHOWN` - `DAC_OVERRIDE` - `FOWNER` - `FSETID` - `KILL` - `MKNOD` - `NET_BIND_SERVICE` - `SETFCAP` - `SETGID` - `SETPCAP` - `SETUID` - `SYS_CHROOT` --- Here are the `Restricted` standards for `securityContext.capabilities`: --- Containers must drop `ALL` capabilities, and are only permitted to add back the `NET_BIND_SERVICE` capability. _[This is Linux only policy](https://kubernetes.io/docs/concepts/security/pod-security-standards/#os-specific-policy-controls) in v1.25+ `(.spec.os.name != "windows")`_ **Restricted Fields** - `spec.containers[*].securityContext.capabilities.drop` - `spec.initContainers[*].securityContext.capabilities.drop` - `spec.ephemeralContainers[*].securityContext.capabilities.drop` **Allowed Values** - Any list of capabilities that includes `ALL` **Restricted Fields** - `spec.containers[*].securityContext.capabilities.add` - `spec.initContainers[*].securityContext.capabilities.add` - `spec.ephemeralContainers[*].securityContext.capabilities.add` **Allowed Values** - Undefined/nil - `NET_BIND_SERVICE` --- Finally, here is my `securityContext.capabilities` definition in my yaml: ```yaml capabilities: drop: ["ALL"] add: - NET_BIND_SERVICE ``` This clearly follows the `Restricted` Pod Security Standards, but the `ksv022` alert is coming from the `Baseline` standards. This is clearly just a difference in the standards that I am following and the standards that are being used to scan the yaml manifest. Therefore, I would suggest a new feature where you can select which Pod Security Standards to scan against. Custom objects --- (In my case, all the custom objects triggering alerts are Istio objects) **authorization-policy.yaml** `LOW: authorizationpolicy myService in default namespace should set spec.securityContext.runAsGroup, spec.securityContext.supplementalGroups[*] and spec.securityContext.fsGroup to integer greater than 0 > See https://avd.aquasec.com/misconfig/ksv116` A authorization-policy object cannot have a `securityContext` definition, so this should not be an alert. **request-authentication.yaml** `LOW: requestauthentication myService in default namespace should set spec.securityContext.runAsGroup, spec.securityContext.supplementalGroups[*] and spec.securityContext.fsGroup to integer greater than 0 > See https://avd.aquasec.com/misconfig/ksv116` A request-authentication object cannot have a `securityContext` definition, so this should not be an alert. This sums up all the false positives that I am seeing. ### Desired Behavior The false positive alerts discussed above should not be seen. ### Actual Behavior There are false positives for vanilla Kubernetes objects and custom kubernetes objects for `securityContext` definitions. The most interesting that isn't exactly a false positive is `ksv022` for the deployment manifest, where the `Baseline` Pod Security Standard is being used, rather than the `Resticted` one. ### Reproduction Steps ```bash Execute a `trivy config ` on a helm chart with the objects discussed above. ``` ### Target Kubernetes ### Scanner Misconfiguration ### Output Format Table ### Mode Standalone ### Debug Output ```bash This is not necessary, debug output shows the exact same false positives. ``` ### Operating System macOS Ventura 13.5.1 ### Version ```bash Version: 0.44.1 Vulnerability DB: Version: 2 UpdatedAt: 2023-09-11 06:16:57.742189926 +0000 UTC NextUpdate: 2023-09-11 12:16:57.742189326 +0000 UTC DownloadedAt: 2023-09-11 09:44:46.200029 +0000 UTC Java DB: Version: 1 UpdatedAt: 2023-09-11 00:53:00.064262708 +0000 UTC NextUpdate: 2023-09-14 00:53:00.064262008 +0000 UTC DownloadedAt: 2023-09-11 09:45:53.490486 +0000 UTC Policy Bundle: Digest: sha256:fd5f1ce3d8efb1fe158cb41f9adb9d7c7cc5c4c863b261053c962e6d950350b3 DownloadedAt: 2023-09-12 12:45:02.274108 +0000 UTC ``` ### Checklist - [ ] Run `trivy image --reset` - [ ] Read [the troubleshooting](https://aquasecurity.github.io/trivy/latest/docs/references/troubleshooting/)
chen-keinan commented 10 months ago

@Mo0rBy its true the supplementalGroups setting is not mandatory. however this setting is useful in scenarios where you want to grant a container additional permissions beyond those associated with the primary group.

For example, if you have a Pod that needs access to certain resources or files that are accessible only by a specific group, you can use supplementalGroups to add that group to the container. This way, the container gains the necessary permissions associated with that supplemental group.

If you don't choose to specific this requirements for additional groups, you may not need to set the supplementalGroups field. It depends on the security and permission requirements of your application and the environment in which it runs.

pss baseline checks are executed by default, you can ignore a specific checks by using exceptions

Mo0rBy commented 10 months ago

Ah ok, thank you for the explanation on supplementalGroups, that makes perfect sense.

Yes, I figured that the pss baseline standards were being followed by default, but I think it would be nice to be able to select which pss standard to use. In the original discussion (which this issue was created from) I used rego policy files to create exceptions as myself and my team are following the pss restricted standards.

I think this is my main issue. By default, the pss baseline standards are used and there is no easy way to select a different pss standard, without creating the rego policy exceptions. It just seems to me as if you need to really get deep into how Trivy works in order to make full use of this feature. This isn't necessarily a bad thing, it's actually been really good for me to understand that Trivy uses this repo to get the rules used during the misconfiguration scans, but I do think it should be easier for Trivy users to select a pss baseline to use, rather then having to create custom exceptions.

itaysk commented 10 months ago

@Mo0rBy have you seen Trivy's compliance feature? https://aquasecurity.github.io/trivy/v0.47/docs/compliance/compliance/ Specifically, Kubernetes compliance (including PSS): https://aquasecurity.github.io/trivy/v0.47/docs/target/kubernetes/#compliance

Mo0rBy commented 10 months ago

@Mo0rBy have you seen Trivy's compliance feature? https://aquasecurity.github.io/trivy/v0.47/docs/compliance/compliance/ Specifically, Kubernetes compliance (including PSS): https://aquasecurity.github.io/trivy/v0.47/docs/target/kubernetes/#compliance

Yes, I have seen this feature, but this runs against a live Kubernetes cluster, not against Helm charts. We will be using this feature in the future, but we would like to get our scan results from the Helm charts we deploy BEFORE we deploy them. The reports are given to another team to ensure we are meeting best practices etc.

We will probably use the compliance feature in our pre-prod cluster to ensure that our deployments meet the standards, but again, we would like scan our Helm charts for any misconfigurations, before they are deployed.