Closed Bryce-Soghigian closed 4 months ago
Copying feedback on the PR with messy git history tallaxes left a comment • This particular fix looks good to me, with some suggestions (left comments), and the following caveats:
There are likely more changes related to the switch to out-of-tree providers (such as setting DisableKubeletCloudCredentialProviders feature gate depending on Kubernetes version), would make sense to package and test this all together Need to actually test it on 1.30 - including ACR image pull. (Ideally this would be covered by E2E) Need evidence of E2E tests passing
There are likely more changes related to the switch to out-of-tree providers (such as setting DisableKubeletCloudCredentialProviders feature gate depending on Kubernetes version), would make sense to package and test this all together
We discussed offline, seems the feature gate is removed and defaulted correctly in the kubelet binaries so no need for change here.
Need to actually test it on 1.30 - including ACR image pull. (Ideally this would be covered by E2E) I ran the e2e test paired with setting the k8s version in the k8s version func in image family. IT fails with the following errors
Warning FailedCreatePodSandBox 2m42s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "<sandbox_id>": plugin type="cilium-cni" failed (add): failed to invoke delegated plugin ADD for IPAM: http request failed: Post "http://<ipam_service_address>": dial tcp <local_ip>:: connect: connection refused; failed to request IP address from CNS Normal SandboxChanged 98s (x6 over 2m41s) kubelet Pod sandbox changed, it will be killed and re-created. Normal Pulling 44s (x3 over 98s) kubelet Pulling image "<image_registry>/pause:3.6" Warning Failed 44s (x3 over 88s) kubelet Failed to pull image "<image_registry>/pause:3.6": failed to pull and unpack image "<image_registry>/pause:3.6": failed to resolve reference "<image_registry>/pause:3.6": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://<image_registry>/oauth2/token?scope=repository%3Apause%3Apull&service=<image_registry>: 401 Unauthorized Warning Failed 44s (x3 over 88s) kubelet Error: ErrImagePull Normal BackOff 20s (x4 over 87s) kubelet Back-off pulling image "<image_registry>/pause:3.6" Warning Failed 20s (x4 over 87s) kubelet Error: ImagePullBackOff
I need to investigate why and if there is more needed to supporting the out of tree providers work.
Need evidence of E2E tests passing
I ran the e2e tests here https://github.com/Azure/karpenter-provider-azure/actions/runs/9228266167. Ideally we could run them against 1.30 but as its not out yet no real way to test on 1.30 for apiserver from our e2e tests. Best we can do is set the kubelet version to 1.30.
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % | ||
---|---|---|---|---|---|
pkg/providers/imagefamily/bootstrap/aksbootstrap.go | 7 | 10 | 70.0% | ||
<!-- | Total: | 10 | 13 | 76.92% | --> |
Files with Coverage Reduction | New Missed Lines | % | ||
---|---|---|---|---|
pkg/cache/unavailableofferings.go | 2 | 95.45% | ||
<!-- | Total: | 2 | --> |
Totals | |
---|---|
Change from base Build 9074172591: | -0.01% |
Covered Lines: | 36284 |
Relevant Lines: | 37115 |
Pasting this here for reference: https://github.com/Azure/AgentBaker/pull/4258/files @rakechill for Sov cloud support, we will need to pass in AKS_CUSTOM_CLOUD_CONTAINER_REGISTRY_DNS_SUFFIX="{{- if IsAKSCustomCloud}}{{AKSCustomCloudContainerRegistryDNSSuffix}}{{end}}"
added in this pr. If we don't migrate to the contract in time for GA that is.
I ran the end to end suite for ACR pull locally.
Ran 1 of 1 Specs in 369.598 seconds SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 0 Skipped --- PASS: TestAcr (369.60s) PASS ok github.com/Azure/karpenter-provider-azure/test/suites/acr 370.152s
We pass with nodes that are set to 1.30 for their k8s version. We should be good to get this one in if CI passes here.
Fixes # NA
Description Bootstrap for 1.30 will fail if we have the --azure-container-registry-config flag specified.
By removing this flag we get nodes for k8s version 1.30 to bootstrap successfully.
How was this change tested?
Does this change impact docs?
Release Note