Azure / AKS-Landing-Zone-Accelerator

Official repository for the AKS Landing Zone Accelerator program
MIT License
216 stars 209 forks source link

[BUG] Deployment fails when reaching [stage 06] - ControlPlaneAddOnsNotReady #10

Closed rohancragg closed 2 years ago

rohancragg commented 2 years ago

Describe the bug Deployment error when reaching stage 06

code: ControlPlaneAddOnsNotReady

message: Pods not in Running status: coredns-547dd8b568-lrswn,coredns-autoscaler-6fb889cdfc-twmtf,metrics-server-7d59848cc6-jkd59,metrics-server-7d59848cc6-sl2tq,tunnelfront-69c958dc9-gndmc,tunnelfront-7f8f87f77c-wjqrn

Also kubectl get nodes from jumpbox returns 'No resources found'

To Reproduce Follow steps in AKS-Secure-Baseline-PrivateCluster/Bicep

Expected behavior AKS cluster should deploy correctly kubectl get nodes should return 3 nodes

rohancragg commented 2 years ago

An example event grabbed from kubectl cluster-info dump:

says: no nodes available to schedule pods

{
            "metadata": {
                "name": "azure-policy-7b67f765f-kshzq.16ea12e114a5bda8",
                "namespace": "kube-system",
                "uid": "f1fbb7b9-03c8-4833-aab0-1cef01cd6758",
                "resourceVersion": "8471",
                "creationTimestamp": "2022-04-28T13:36:11Z"
            },
            "involvedObject": {
                "kind": "Pod",
                "namespace": "kube-system",
                "name": "azure-policy-7b67f765f-kshzq",
                "uid": "42cd254f-24f3-4088-9fb1-19f386dd1af9",
                "apiVersion": "v1",
                "resourceVersion": "648"
            },
            "reason": "FailedScheduling",
            "message": "no nodes available to schedule pods",
            "source": {
                "component": "default-scheduler"
            },
            "firstTimestamp": "2022-04-28T13:36:11Z",
            "lastTimestamp": "2022-04-28T14:11:34Z",
            "count": 35,
            "type": "Warning",
            "eventTime": null,
            "reportingComponent": "",
            "reportingInstance": ""
        }
rohancragg commented 2 years ago

I thnk I know what the problem was. I suspect that I incorrectly edited parameters and set the wrong IP address for firewall and dns on the spoke VNet so I'm trying to redeploy now. That would explain why the Node VMs and VMSS has no outbound connectivity

rohancragg commented 2 years ago

My error in editing temapltes, no issue in the templates