Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 307 forks source link

AKS not acting on Azure Built-In Policy #2763

Closed johndohoneyjr closed 2 years ago

johndohoneyjr commented 2 years ago

What happened:

I tried this with both "Standard" and "Hardened" SKUs. I activated the Policy agent and verified the Azure Policy Agent and Gatekeeper were active in my cluster

john@DESKTOP-5CAD0JG:/mnt/c/Users/johndohoney$ k get po -n kube-system NAME READY STATUS RESTARTS AGE azure-ip-masq-agent-fhmd2 1/1 Running 0 66m azure-ip-masq-agent-pg8kq 1/1 Running 0 66m azure-policy-59bbf5454f-rgjb8 1/1 Running 2 67m azure-policy-webhook-84884d989b-jntsv 1/1 Running 0 50m coredns-845757d86-5fscs 1/1 Running 0 50m coredns-845757d86-l4756 1/1 Running 0 66m coredns-autoscaler-5f85dc856b-whpj8 1/1 Running 0 50m csi-azuredisk-node-542pn 3/3 Running 0 66m csi-azuredisk-node-wwrwb 3/3 Running 0 66m csi-azurefile-node-678tf 3/3 Running 0 66m csi-azurefile-node-xdtn7 3/3 Running 0 66m konnectivity-agent-664d9bff8b-2jz7k 1/1 Running 0 35m konnectivity-agent-664d9bff8b-7jm89 1/1 Running 0 35m kube-proxy-br5rg 1/1 Running 0 66m kube-proxy-kgvbx 1/1 Running 0 66m metrics-server-6bc97b47f7-zpxpz 1/1 Running 1 67m omsagent-4gm9r 2/2 Running 0 66m omsagent-cbztw 2/2 Running 0 66m omsagent-rs-7df76848-cggsk 1/1 Running 0 50m john@DESKTOP-5CAD0JG:/mnt/c/Users/johndohoney$ k get pods -n gatekeeper-system NAME READY STATUS RESTARTS AGE gatekeeper-audit-7b87566755-m55kw 1/1 Running 0 52m gatekeeper-controller-6844c5c896-2kkpj 1/1 Running 0 52m gatekeeper-controller-6844c5c896-dkzq9 1/1 Running 2 68m

I then verified the "Disallowed Cacpabilities": k get k8sazuredisallowedcapabilities -o yaml

apiVersion: v1 items:

***** Notice the following excerpts from above **** disallowedCapabilities:

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ ### $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

What you expected to happen: The following manifest should be blocked from running:

apiVersion: v1 kind: Pod metadata: name: security-context-demo-4 spec: containers:

I tried a variation to make sure I got it right -- this too should have errored on the kubectl create

apiVersion: v1 kind: Pod metadata: name: nginx-privileged spec: containers:

How to reproduce it (as minimally and precisely as possible):

  1. Start an AKS cluster (I did use the hardened SKU -- but had similar results on Standard SKU)
  2. az aks get-credentials --resource-group $RESOURCE_GROUP --name myAKSCluster3
  3. Start the Policy agent in the Portal or via cli then verify
  4. k get pods -n gatekeeper-system
  5. k get pods -n kube-system (There was a few restarts)
  6. ASSIGN the 2 Azure BuiltIn Policies
  7. Azure Policy Add-on for Kubernetes service (AKS) should be installed and enabled on your clusters
  8. Kubernetes clusters should not grant CAP_SYS_ADMIN security capabilities
  9. Restart the AKS Cluster
  10. k create myroottest.yaml

apiVersion: v1 kind: Pod metadata: name: security-context-demo-4 spec: containers:

  1. Manually kick off a policy scan -- az policy state trigger-scan --resource-group "dohoney-aksdemo-rg"
  2. k exec -ti security-context-demo-4-1 -- /bin/bash
  3. capsh --print | grep cap_sys_admin **** This should NOT HAPPEN Anything else we need to know?: Checked "Compliance" and my Cluster was "Compliant" with a Non-Compliant Pod In the Portal Policy Blade

Environment:

ghost commented 2 years ago

Hi johndohoneyjr, AKS bot here :wave: Thank you for posting on the AKS Repo, I'll do my best to get a kind human from the AKS team to assist you.

I might be just a bot, but I'm told my suggestions are normally quite good, as such: 1) If this case is urgent, please open a Support Request so that our 24/7 support team may help you faster. 2) Please abide by the AKS repo Guidelines and Code of Conduct. 3) If you're having an issue, could it be described on the AKS Troubleshooting guides or AKS Diagnostics? 4) Make sure your subscribed to the AKS Release Notes to keep up to date with all that's new on AKS. 5) Make sure there isn't a duplicate of this issue already reported. If there is, feel free to close this one and '+1' the existing issue. 6) If you have a question, do take a look at our AKS FAQ. We place the most common ones there!

ghost commented 2 years ago

@az-policy-kube would you be able to assist?

Issue Details
**What happened**: I tried this with both "Standard" and "Hardened" SKUs. I activated the Policy agent and verified the Azure Policy Agent and Gatekeeper were active in my cluster john@DESKTOP-5CAD0JG:/mnt/c/Users/johndohoney$ k get po -n kube-system NAME READY STATUS RESTARTS AGE azure-ip-masq-agent-fhmd2 1/1 Running 0 66m azure-ip-masq-agent-pg8kq 1/1 Running 0 66m azure-policy-59bbf5454f-rgjb8 1/1 Running 2 67m azure-policy-webhook-84884d989b-jntsv 1/1 Running 0 50m coredns-845757d86-5fscs 1/1 Running 0 50m coredns-845757d86-l4756 1/1 Running 0 66m coredns-autoscaler-5f85dc856b-whpj8 1/1 Running 0 50m csi-azuredisk-node-542pn 3/3 Running 0 66m csi-azuredisk-node-wwrwb 3/3 Running 0 66m csi-azurefile-node-678tf 3/3 Running 0 66m csi-azurefile-node-xdtn7 3/3 Running 0 66m konnectivity-agent-664d9bff8b-2jz7k 1/1 Running 0 35m konnectivity-agent-664d9bff8b-7jm89 1/1 Running 0 35m kube-proxy-br5rg 1/1 Running 0 66m kube-proxy-kgvbx 1/1 Running 0 66m metrics-server-6bc97b47f7-zpxpz 1/1 Running 1 67m omsagent-4gm9r 2/2 Running 0 66m omsagent-cbztw 2/2 Running 0 66m omsagent-rs-7df76848-cggsk 1/1 Running 0 50m john@DESKTOP-5CAD0JG:/mnt/c/Users/johndohoney$ k get pods -n gatekeeper-system NAME READY STATUS RESTARTS AGE gatekeeper-audit-7b87566755-m55kw 1/1 Running 0 52m gatekeeper-controller-6844c5c896-2kkpj 1/1 Running 0 52m gatekeeper-controller-6844c5c896-dkzq9 1/1 Running 2 68m I then verified the "Disallowed Cacpabilities": k get k8sazuredisallowedcapabilities -o yaml apiVersion: v1 items: - apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sAzureDisallowedCapabilities metadata: annotations: azure-policy-assignment-id: /subscriptions/33732876-7635-4beb-9654-e3c3c37b7ecb/providers/Microsoft.Authorization/policyAssignments/8e4e81f041694b4ba62ce4ce azure-policy-definition-id: /providers/Microsoft.Authorization/policyDefinitions/d2e7ea85-6b44-4317-a0be-1b951587f626 azure-policy-definition-reference-id: "" azure-policy-setdefinition-id: "" constraint-installed-by: azure-policy-addon constraint-url: https://store.policy.core.windows.net/kubernetes/container-disallowed-capabilities/v2/constraint.yaml creationTimestamp: "2022-01-28T17:44:24Z" generation: 1 labels: managed-by: azure-policy-addon name: azurepolicy-container-disallowed-capabilit-974fdccb386c73cd0fd6 resourceVersion: "49521" uid: 66a9cc5d-6de6-4f24-9b05-14e8779d5252 spec: enforcementAction: deny match: excludedNamespaces: - kube-system - gatekeeper-system - azure-arc kinds: - apiGroups: - "" kinds: - Pod parameters: disallowedCapabilities: - CAP_SYS_ADMIN excludedContainers: [] status: auditTimestamp: "2022-01-28T21:26:27Z" byPod: - constraintUID: 66a9cc5d-6de6-4f24-9b05-14e8779d5252 enforced: true id: gatekeeper-audit-7b87566755-m55kw observedGeneration: 1 operations: - audit - status - constraintUID: 66a9cc5d-6de6-4f24-9b05-14e8779d5252 enforced: true id: gatekeeper-controller-6844c5c896-2kkpj observedGeneration: 1 operations: - webhook - constraintUID: 66a9cc5d-6de6-4f24-9b05-14e8779d5252 enforced: true id: gatekeeper-controller-6844c5c896-dkzq9 observedGeneration: 1 operations: - webhook totalViolations: 0 - apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sAzureDisallowedCapabilities metadata: annotations: azure-policy-assignment-id: /subscriptions/33732876-7635-4beb-9654-e3c3c37b7ecb/providers/Microsoft.Authorization/policyAssignments/SecurityCenterBuiltIn azure-policy-definition-id: /providers/Microsoft.Authorization/policyDefinitions/d2e7ea85-6b44-4317-a0be-1b951587f626 azure-policy-definition-reference-id: KubernetesClustersShouldNotGrantCAPSYSADMINSecurityCapabilitiesMonitoringEffect azure-policy-setdefinition-id: /providers/Microsoft.Authorization/policySetDefinitions/1f3afdf9-d0c9-4c3d-847f-89da613e70a8 constraint-installed-by: azure-policy-addon constraint-url: https://store.policy.core.windows.net/kubernetes/container-disallowed-capabilities/v2/constraint.yaml creationTimestamp: "2022-01-28T17:44:24Z" generation: 1 labels: managed-by: azure-policy-addon name: azurepolicy-container-disallowed-capabilit-9b888b2fa665de7d4ef6 resourceVersion: "49518" uid: 3b01e93a-ea5f-44a4-88ff-fd4f5f0ccc0c spec: enforcementAction: dryrun match: excludedNamespaces: - kube-system - gatekeeper-system - azure-arc kinds: - apiGroups: - "" kinds: - Pod parameters: disallowedCapabilities: - CAP_SYS_ADMIN excludedContainers: [] status: auditTimestamp: "2022-01-28T21:26:27Z" byPod: - constraintUID: 3b01e93a-ea5f-44a4-88ff-fd4f5f0ccc0c enforced: true id: gatekeeper-audit-7b87566755-m55kw observedGeneration: 1 operations: - audit - status - constraintUID: 3b01e93a-ea5f-44a4-88ff-fd4f5f0ccc0c enforced: true id: gatekeeper-controller-6844c5c896-2kkpj observedGeneration: 1 operations: - webhook - constraintUID: 3b01e93a-ea5f-44a4-88ff-fd4f5f0ccc0c enforced: true id: gatekeeper-controller-6844c5c896-dkzq9 observedGeneration: 1 operations: - webhook totalViolations: 0 kind: List metadata: resourceVersion: "" selfLink: "" ********************************************************* Notice the following excerpts from above ************************************ disallowedCapabilities: - CAP_SYS_ADMIN ******************************************************* THE BUG ******************************************************************** **spec: enforcementAction: dryrun** $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ ### $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ **What you expected to happen**: The following manifest should be blocked from running: apiVersion: v1 kind: Pod metadata: name: security-context-demo-4 spec: containers: - name: sec-ctx-4 image: gcr.io/google-samples/node-hello:1.0 securityContext: capabilities: add: ["SYS_ADMIN"] I tried a variation to make sure I got it right -- this too should have errored on the kubectl create apiVersion: v1 kind: Pod metadata: name: nginx-privileged spec: containers: - name: nginx-privileged image: mcr.microsoft.com/oss/nginx/nginx:1.15.5-alpine securityContext: privileged: true **How to reproduce it (as minimally and precisely as possible)**: 1. Start an AKS cluster (I did use the hardened SKU -- but had similar results on Standard SKU) 2. az aks get-credentials --resource-group $RESOURCE_GROUP --name myAKSCluster3 3. Start the Policy agent in the Portal or via cli then verify 4. k get pods -n gatekeeper-system 5. k get pods -n kube-system (There was a few restarts) 6. ASSIGN the 2 Azure BuiltIn Policies 7. Azure Policy Add-on for Kubernetes service (AKS) should be installed and enabled on your clusters 8. Kubernetes clusters should not grant CAP_SYS_ADMIN security capabilities 9. Restart the AKS Cluster 10. k create myroottest.yaml apiVersion: v1 kind: Pod metadata: name: security-context-demo-4 spec: containers: - name: sec-ctx-4 image: gcr.io/google-samples/node-hello:1.0 securityContext: capabilities: add: ["SYS_ADMIN"] 11. Manually kick off a policy scan -- az policy state trigger-scan --resource-group "dohoney-aksdemo-rg" 12. k exec -ti security-context-demo-4-1 -- /bin/bash 13. capsh --print | grep cap_sys_admin ******************************** This should NOT HAPPEN **Anything else we need to know?**: Checked "Compliance" and my Cluster was "Compliant" with a Non-Compliant Pod In the Portal Policy Blade **Environment**: - Kubernetes version (use `kubectl version`): 1.21.7 - Size of cluster (how many worker nodes are in the cluster?) - john@DESKTOP-5CAD0JG:/mnt/c/Users/johndohoney$ k get nodes NAME STATUS ROLES AGE VERSION aks-agentpool-34591016-vmss000001 Ready agent 3d1h v1.21.7 aks-userpool-34591016-vmss000001 Ready agent 3d1h v1.21.7 - General description of workloads in the cluster (e.g. HTTP microservices, Java app, Ruby on Rails, machine learning, etc.) - Simple test application using busy box - Others:
Author: johndohoneyjr
Assignees: -
Labels: `triage`, `azure/policy`
Milestone: -
miwithro commented 2 years ago

@johndohoneyjr this is a flaw in the logic of the policy. If you specify "CAP_SYS_ADMIN" the policy will fail. We will clean up the logic to catch "SYS_ADMIN"

ghost commented 2 years ago

Action required from @Azure/aks-pm

ghost commented 2 years ago

Action required from @Azure/aks-pm

ghost commented 2 years ago

Issue needing attention of @Azure/aks-leads

ghost commented 2 years ago

Issue needing attention of @Azure/aks-leads

ghost commented 2 years ago

Issue needing attention of @Azure/aks-leads

ghost commented 2 years ago

Issue needing attention of @Azure/aks-leads

ghost commented 2 years ago

This issue will now be closed because it hasn't had any activity for 7 days after stale. johndohoneyjr feel free to comment again on the next 7 days to reopen or open a new issue after that time if you still have a question/issue or suggestion.

fseldow commented 2 years ago

Thx for raising the issue and detailed investigation. The issue should be fixed on Feb, sorry for very late response