Closed eponerine closed 9 months ago
This DEF is related to the version of Kubernetes it is running on.
Tested the deployment on 1.23 and no scheduling issues. Looking at the nodes, we can see that they're both control-plane and master!
Someone has to update the azure-arc-onboarding stuff (namely, the guard deployment).
PS C:\Windows\system32> kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
moc-l595w5d9s4u NotReady <none> 1s v1.23.12 172.24.144.68 <none> CBL-Mariner/Linux 5.15.102.1-3.cm2 containerd://1.6.18
moc-ljwkuwsej4z Ready control-plane,master 5m8s v1.23.12 172.24.144.67 <none> CBL-Mariner/Linux 5.15.102.1-3.cm2 containerd://1.6.18
moc-lqdv2ou1uqw Ready control-plane,master 10m v1.23.12 172.24.144.66 <none> CBL-Mariner/Linux 5.15.102.1-3.cm2 containerd://1.6.18
Hi @eponerine
I'm using Azure AD RBAC as well, same deployment option. The clusters are at Kubernetes 1.24.11 and control planes are labeled as 'control-plane.master' and everything works fine. It starts to fail with Kubernetes 1.25, because arc onboarding will not finish - old / missing control-plane label - correct?
Yup, bingo.
PG is aware of this. At least I spammed them via email a few weeks ago.
My suggestion is to open a ticket thru Azure Portal about this issue so it gets engineering eyes on it.
Ok, thanks. I'll stay at Kubernetes 1.24.* until this is fixed.
Tested it again with the latest AKS-hybrid-2307 release. Not solved yet.
Using -kubernetesVersion "v1.24.11"
at all clusters with AzureRBAC enabled.
@Elektronenvolt - can you open a ticket thru Azure Portal describing this issue? And once done, pass the ticket number along to me?
I never got around to opening one and PG said they need one to move forward. My environment is currently at capacity to test this.
@eponerine - created a support case and added you to a conversation with the PG.
@Elektronenvolt - thanks. I pinged the same folks (plus more) on my other email thread referencing this ticket. Hopefully there is movement.
I got a workaround from support and tested it now - following works with Kubernetes 1.25.*
Create a new cluster with -EnableAzureRBAC, wait for the error message and connect by certificate
Get the guard deployment: kubectl get deployment guard -n azure-arc -o yaml > guard.yaml
Edit the guard deployment - remove nodeSelector node-role.kubernetes.io/master: "" and add a toleration for key: node-role.kubernetes.io/control-plane
Delete the guard deployment: kubectl delete deployment guard -n azure-arc
Deploy it from the modified yaml: kubectl create -f .\guard.yaml
Generate an RBAC config and connect: Get-AksHciCredential -name <cluster> -aadauth
@eponerine seems to be fixed in Azure Arc. I've created a cluster with Kubernetes 1.26.3 an hour ago and the Azure AD RBAC onboarding worked fine. I've exported the Guard deployment and compared it to a modified deployment:
Left side: from the new cluster Right side: the applied workaround from an older cluster
Interesting. I'll probably be deploying a new cluster soon. I'll test it with latest k8s version
It looks like it's fixed. The guard component has a higher version and includes the changes I received from support as workaround.
Marking as closed.
@Elektronenvolt - what program do you use for diff'ing text files? I love the comparison "flow" UI there. I've been using BeyondCompare for years, but willing to try something new :D
Meld: https://meldmerge.org/
Describe the bug While trying to deploy AKS Hybrid with the
-enableAzureRBAC
flag and prereqs, the last step ofNew-AksHciCluster
hangs for 1800 seconds:If you get the logs of that pod (obfuscating some GUIDs), you'll see its hung up on
Error from server (NotFound): namespaces "azure-arc" not found
:That namespace exists, so I'm unsure what the error is going on about:
But the pod
guard-755cc8dd58-vbpjd
sticks out as it is0/2
status, so let's describe it:Well that is a problem! Kubernetes changed from
master
tocontrol-plane
awhile ago. Node-Selector labels are "inclusive", meaning all labels must match (a logicalAND
, if you will): https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/The way this deployment is configured will never work, as you wont have
control-plane
andmaster
both at the same time (unless jumping between major version differences of k8s???).I cannot personally confirm if this will progress now, because I think the job times out before I can manually edit things to "work", but this should be easy to reproduce.
To Reproduce
Expected behavior Azure Arc registration completes with the
-enableAzureRBAC
flag and config.Environment (please complete the following information):
Collect log files