Azure / karpenter-provider-azure

AKS Karpenter Provider
Apache License 2.0
308 stars 46 forks source link

Need to use Karpenter to add node group with fips --fips-enabled #415

Open chkp-ilyaro opened 1 week ago

chkp-ilyaro commented 1 week ago

Version

Karpenter Version: v0.0.0

Kubernetes Version: v1.0.0

Expected Behavior

When deploying nodepool.yaml from example as is only with 1 change We need to deploy nodegroup with fips-enabled support AKSUbuntu-2004fipscontainerd-202405.27.0 it is imageVersion I deployed AKS from TF and az and this version were used


apiVersion: karpenter.azure.com/v1alpha2 kind: AKSNodeClass metadata: name: default annotations: kubernetes.io/description: "General purpose AKSNodeClass for running Ubuntu2204 nodes" spec: imageVersion: AKSUbuntu-2004fipscontainerd-202405.27.0

Actual Behavior

RESPONSE 404: 404 Not Found\nERROR CODE: GalleryImageNotFound\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"GalleryImageNotFound\",\n \"message\": \"\\"The gallery image /CommunityGalleries/AKSUbuntu-38d80f77-467a-481f-a8d4-09b6d4220bd2/images/2204gen2containerd/versions/AKSUbuntu-2004fipscontainerd-202405.27.0 is not available in eastus region. Please contact image owner to replicate to this region, or change your requested region.

The image is not found in any region I tested all IMHO the path is wrong,
/CommunityGalleries/AKSUbuntu-38d80f77-467a-481f-a8d4-09b6d4220bd2/images/2204gen2containerd/versions/AKSUbuntu-2004fipscontainerd-202405.27.0

The images families is hardcoded https://github.com/Azure/karpenter-provider-azure/blob/5ee206b44978eeb537a1d08f347fda6742b5181f/pkg/providers/imagefamily/types.go#L26

Steps to Reproduce the Problem

Do the same steps as in README and add imageVersion spec: imageVersion: AKSUbuntu-2004fipscontainerd-202405.27.0

Resource Specs and Logs

I deployed inflate deployment k get po NAME READY STATUS RESTARTS AGE inflate-5f57665f58-hcddq 1/1 Running 0 4d1h inflate-5f57665f58-j59tc 0/1 Pending 0 4d1h inflate-5f57665f58-nrkxg 0/1 Pending 0 4d1h inflate-5f57665f58-p7rv5 0/1 Pending 0 4d1h inflate-5f57665f58-r4lfq 0/1 Pending 0 3d21h

This is the log we get

$ kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controller {"level":"DEBUG","time":"2024-06-23T13:31:56.914Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"}{"level":"DEBUG","time":"2024-06-23T13:32:18.928Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"INFO","time":"2024-06-23T13:32:19.321Z","logger":"controller.nodeclaim.lifecycle","message":"Selected instance type Standard_D8ls_v5","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"INFO","time":"2024-06-23T13:32:19.322Z","logger":"controller.nodeclaim.lifecycle","message":"Resolved image /CommunityGalleries/AKSUbuntu-38d80f77-467a-481f-a8d4-09b6d4220bd2/images/2204gen2containerd/versions/microsoft-aks:aks-aez:aks-ubuntu-containerd-2204-gen2-2023-q2:2023.04.10 for instance type Standard_D8ls_v5","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:19.323Z","logger":"controller.nodeclaim.lifecycle","message":"Returning 2 IPv4 backend pools: [/subscriptions/4b-XXXXX-XXXXXXXXX-XXXXXXXXXX/resourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Network/loadBalancers/kubernetes/backendAddressPools/aksOutboundBackendPool /subscriptions/4bXXXXXXXXXXXXXXXX-XXXXXXXX-XXXXXXXXX/resourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Network/loadBalancers/kubernetes/backendAddressPools/kubernetes]","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:19.323Z","logger":"controller.nodeclaim.lifecycle","message":"Creating network interface aks-general-purpose-87qcx","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:19.929Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"DEBUG","time":"2024-06-23T13:32:19.956Z","logger":"controller.nodeclaim.lifecycle","message":"Successfully created network interface: /subscriptions/4bXXXXXX-XXXXXXXX-XXXXXXX/resourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Network/networkInterfaces/aks-general-purpose-87qcx","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:19.956Z","logger":"controller.nodeclaim.lifecycle","message":"Creating virtual machine aks-general-purpose-87qcx (Standard_D8ls_v5)","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:20.930Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"DEBUG","time":"2024-06-23T13:32:21.931Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"DEBUG","time":"2024-06-23T13:32:22.219Z","logger":"controller.provisioner","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"ERROR","time":"2024-06-23T13:32:22.635Z","logger":"controller.nodeclaim.lifecycle","message":"Creating virtual machine \"aks-general-purpose-87qcx\" failed: PUT https://management.azure.com/subscriptions/4bXXXXXXX_XXXXXXXX_XXXXX/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Compute/virtualMachines/aks-general-purpose-87qcx\n--------------------------------------------------------------------------------\nRESPONSE 404: 404 Not Found\nERROR CODE: GalleryImageNotFound\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"GalleryImageNotFound\",\n \"message\": \"\\"The gallery image /CommunityGalleries/AKSUbuntu-38d80f77-467a-481f-a8d4-09b6d4220bd2/images/2204gen2containerd/versions/microsoft-aks:aks-aez:aks-ubuntu-containerd-2204-gen2-2023-q2:2023.04.10 is not available in eastus region. Please contact image owner to replicate to this region, or change your requested region.\\"\",\n \"target\": \"imageReference\"\n }\n}\n--------------------------------------------------------------------------------\n","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:22.931Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"ERROR","time":"2024-06-23T13:32:23.161Z","logger":"controller.nodeclaim.lifecycle","message":"networkInterface.Delete for aks-general-purpose-87qcx failed: DELETE https://management.azure.com/subscriptions/4b-XXXXXXXXXX-XXXXXXXX/ResourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Network/networkInterfaces/aks-general-purpose-87qcx\n--------------------------------------------------------------------------------\nRESPONSE 400: 400 Bad Request\nERROR CODE: NicReservedForAnotherVm\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"NicReservedForAnotherVm\",\n \"message\": \"Nic(s) in request is reserved for another Virtual Machine for 180 seconds. Please provide another nic(s) or retry after 180 seconds. Reserved VM: /subscriptions/4b-XXXXXXX_XXXXXXX_XXXX/resourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Compute/virtualMachines/aks-general-purpose-87qcx\",\n \"details\": []\n }\n}\n--------------------------------------------------------------------------------\n","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"ERROR","time":"2024-06-23T13:32:23.161Z","logger":"controller.nodeclaim.lifecycle","message":"failed to cleanup resources for node claim general-purpose-87qcx, %!w(*errors.joinError=&{[0xc0019aae20]})","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"ERROR","time":"2024-06-23T13:32:23.162Z","logger":"controller","message":"Reconciler error","commit":"bbaa9b7","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"general-purpose-87qcx"},"namespace":"","name":"general-purpose-87qcx","reconcileID":"0e1c39df-c71a-436a-aa24-dc6395651ae7","error":"launching nodeclaim, creating instance, virtualMachine.BeginCreateOrUpdate for VM \"aks-general-purpose-87qcx\" failed: PUT https://management.azure.com/subscriptions/4b-xxxxxxxx-xxxx/resourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Compute/virtualMachines/aks-general-purpose-87qcx\n--------------------------------------------------------------------------------\nRESPONSE 404: 404 Not Found\nERROR CODE: GalleryImageNotFound\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"GalleryImageNotFound\",\n \"message\": \"\\"The gallery image /CommunityGalleries/AKSUbuntu-38d80f77-467a-481f-a8d4-09b6d4220bd2/images/2204gen2containerd/versions/microsoft-aks:aks-aez:aks-ubuntu-containerd-2204-gen2-2023-q2:2023.04.10 is not available in eastus region. Please contact image owner to replicate to this region, or change your requested region.\\"\",\n \"target\": \"imageReference\"\n }\n}\n--------------------------------------------------------------------------------\n"}

Community Note

chkp-ilyaro commented 1 week ago

Hi,

I found the problem limitation

https://github.com/Azure/karpenter-provider-azure/blob/5ee206b44978eeb537a1d08f347fda6742b5181f/pkg/apis/crds/karpenter.azure.com_aksnodeclasses.yaml#L50

Unsupported value: "Ubuntu2004": supported values: "Ubuntu2204", "AzureLinux" fips_ebabled node is only Ubuntu2004 but Karpenter for Azure doesn’t support it So Karpenter and fips_enabled nodes can’t work together currently

Fips_enabled imageversion path is here: /subscriptions/109a5e88-712a-48ae-9078-9ca8b3c81345/resourceGroups/AKS-Ubuntu/providers/Microsoft.Compute/galleries/AKSUbuntu/images/2004gen2fipscontainerd/versions/202405.27.0 The info can be taken from VMSS in Azure