Azure / karpenter-provider-azure

AKS Karpenter Provider
Apache License 2.0
308 stars 46 forks source link

chore: bumping the grid driver affecting linux kernels 5.15.1063+ versions #381

Closed Bryce-Soghigian closed 1 month ago

Bryce-Soghigian commented 1 month ago

Description See: https://github.com/Azure/AgentBaker/pull/4429 for additional context as to why we are bumping. How was this change tested?

Does this change impact docs?

Release Note

coveralls commented 1 month ago

Pull Request Test Coverage Report for Build 9356764125

Details


Totals Coverage Status
Change from base Build 9074172591: 0.0%
Covered Lines: 36279
Relevant Lines: 37105

💛 - Coveralls
Bryce-Soghigian commented 1 month ago

See: https://forums.developer.nvidia.com/t/linux-6-7-3-545-29-06-550-40-07-error-modpost-gpl-incompatible-module-nvidia-ko-uses-gpl-only-symbol-rcu-read-lock/280908/45 for more context

Bryce-Soghigian commented 1 month ago

Looks like its fixed from version bump going in. I ran the e2es locally and via the pipeline both pass

~/dev/focus/karpenter-provider-azure (bsoghigian/gpu-driver-bump*) » k get pods -A sillygoose@Bryces-MacBook-Pro NAMESPACE NAME READY STATUS RESTARTS AGE default devourerwhite-5-vfh02dctly-7968d96bb-mdjdv 1/1 Running 0 3m5s gatekeeper-system gatekeeper-audit-57fc5568f8-5b987 1/1 Running 0 2m26s gatekeeper-system gatekeeper-controller-6494586d5d-7k4db 1/1 Running 0 2m26s gatekeeper-system gatekeeper-controller-6494586d5d-dqnfd 1/1 Running 0 2m26s karpenter karpenter-66d689464f-jljmz 1/1 Running 0 6m23s kube-system azure-cns-5s6sv 1/1 Running 0 43s kube-system azure-cns-7g95z 1/1 Running 0 11m kube-system azure-cns-p6xn8 1/1 Running 0 11m kube-system azure-cns-z7c45 1/1 Running 0 11m kube-system azure-ip-masq-agent-8njp2 1/1 Running 0 11m kube-system azure-ip-masq-agent-dxlfm 1/1 Running 0 11m kube-system azure-ip-masq-agent-fp8j2 1/1 Running 0 11m kube-system azure-ip-masq-agent-k89d4 1/1 Running 0 43s kube-system azure-policy-d76896767-r8rp9 1/1 Running 0 2m26s kube-system azure-policy-webhook-564c9d7c7b-v65wr 1/1 Running 0 2m26s kube-system azure-wi-webhook-controller-manager-7585698f56-4jnwz 1/1 Running 0 10m kube-system azure-wi-webhook-controller-manager-7585698f56-5s9g9 1/1 Running 0 10m kube-system cilium-fc5b5 1/1 Running 0 42s kube-system cilium-mp6sj 1/1 Running 0 11m kube-system cilium-operator-559887cf4-7w6rt 1/1 Running 0 11m kube-system cilium-operator-559887cf4-gdw9m 1/1 Running 0 11m kube-system cilium-sml28 1/1 Running 0 11m kube-system cilium-wvrgh 1/1 Running 0 11m kube-system cloud-node-manager-lzv8v 1/1 Running 0 11m kube-system cloud-node-manager-q6vx2 1/1 Running 0 11m kube-system cloud-node-manager-xwkws 1/1 Running 0 42s kube-system cloud-node-manager-z7sfg 1/1 Running 0 11m kube-system coredns-767bfbd4fb-n876d 1/1 Running 0 11m kube-system coredns-767bfbd4fb-zrjfk 1/1 Running 0 10m kube-system coredns-autoscaler-c6649b67c-b5x85 1/1 Running 0 11m kube-system csi-azuredisk-node-779m5 3/3 Running 0 11m kube-system csi-azuredisk-node-fv7ll 3/3 Running 0 11m kube-system csi-azuredisk-node-mzv6c 3/3 Running 0 11m kube-system csi-azuredisk-node-x6cnl 3/3 Running 0 43s kube-system csi-azurefile-node-4pwqk 3/3 Running 0 11m kube-system csi-azurefile-node-kldhr 3/3 Running 0 11m kube-system csi-azurefile-node-mkkd7 3/3 Running 0 42s kube-system csi-azurefile-node-rgw62 3/3 Running 0 11m kube-system konnectivity-agent-c98b47dbd-5vqm9 1/1 Running 0 11m kube-system konnectivity-agent-c98b47dbd-dpkts 1/1 Running 0 11m kube-system metrics-server-76d77694d4-9pv6p 2/2 Running 0 10m kube-system metrics-server-76d77694d4-r8r6d 2/2 Running 0 10m kube-system nvidia-device-plugin-daemonset-vv8t7 1/1 Running 0 21s

Bryce-Soghigian commented 1 month ago

Going to merge as these e2es are unrelated to my change and the GPU ones are passing. E2Es are failing on cluster create steps and not karpenter logic itself.