The problem was that when installing an ingress controller, a repeating error was coming up trying to access an AKS internal service (metrics-server). There are conflicting bug reports on github about this, both starting, being resolved, and about a 1000 different workarounds, but most of them assume you control it - in AKS the service is started automatically as part of the standard deployment.
In the end... I found out that when Azure reports the cluster is ready, that no longer means their deployments are complete. I build an extra step that checks the status of deployments to the kube-system namespace and waits for them to all be ready (which is consistently about 2 minutes after AKS says it's ready). With that step in place, helm deployments are happening properly, without error, and I'm getting success deploying Rancher again.
The problem was that when installing an ingress controller, a repeating error was coming up trying to access an AKS internal service (metrics-server). There are conflicting bug reports on github about this, both starting, being resolved, and about a 1000 different workarounds, but most of them assume you control it - in AKS the service is started automatically as part of the standard deployment.
In the end... I found out that when Azure reports the cluster is ready, that no longer means their deployments are complete. I build an extra step that checks the status of deployments to the kube-system namespace and waits for them to all be ready (which is consistently about 2 minutes after AKS says it's ready). With that step in place, helm deployments are happening properly, without error, and I'm getting success deploying Rancher again.