Azure / karpenter-provider-azure

AKS Karpenter Provider
Apache License 2.0
394 stars 65 forks source link

Need to use Karpenter to add node group with fips --fips-enabled #415

Open chkp-ilyaro opened 5 months ago

chkp-ilyaro commented 5 months ago

Version

Karpenter Version: v0.0.0

Kubernetes Version: v1.0.0

Expected Behavior

When deploying nodepool.yaml from example as is only with 1 change We need to deploy nodegroup with fips-enabled support AKSUbuntu-2004fipscontainerd-202405.27.0 it is imageVersion I deployed AKS from TF and az and this version were used


apiVersion: karpenter.azure.com/v1alpha2 kind: AKSNodeClass metadata: name: default annotations: kubernetes.io/description: "General purpose AKSNodeClass for running Ubuntu2204 nodes" spec: imageVersion: AKSUbuntu-2004fipscontainerd-202405.27.0

Actual Behavior

RESPONSE 404: 404 Not Found\nERROR CODE: GalleryImageNotFound\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"GalleryImageNotFound\",\n \"message\": \"\\"The gallery image /CommunityGalleries/AKSUbuntu-38d80f77-467a-481f-a8d4-09b6d4220bd2/images/2204gen2containerd/versions/AKSUbuntu-2004fipscontainerd-202405.27.0 is not available in eastus region. Please contact image owner to replicate to this region, or change your requested region.

The image is not found in any region I tested all IMHO the path is wrong,
/CommunityGalleries/AKSUbuntu-38d80f77-467a-481f-a8d4-09b6d4220bd2/images/2204gen2containerd/versions/AKSUbuntu-2004fipscontainerd-202405.27.0

The images families is hardcoded https://github.com/Azure/karpenter-provider-azure/blob/5ee206b44978eeb537a1d08f347fda6742b5181f/pkg/providers/imagefamily/types.go#L26

Steps to Reproduce the Problem

Do the same steps as in README and add imageVersion spec: imageVersion: AKSUbuntu-2004fipscontainerd-202405.27.0

Resource Specs and Logs

I deployed inflate deployment k get po NAME READY STATUS RESTARTS AGE inflate-5f57665f58-hcddq 1/1 Running 0 4d1h inflate-5f57665f58-j59tc 0/1 Pending 0 4d1h inflate-5f57665f58-nrkxg 0/1 Pending 0 4d1h inflate-5f57665f58-p7rv5 0/1 Pending 0 4d1h inflate-5f57665f58-r4lfq 0/1 Pending 0 3d21h

This is the log we get

$ kubectl logs -f -n "${KARPENTER_NAMESPACE}" -l app.kubernetes.io/name=karpenter -c controller {"level":"DEBUG","time":"2024-06-23T13:31:56.914Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"}{"level":"DEBUG","time":"2024-06-23T13:32:18.928Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"INFO","time":"2024-06-23T13:32:19.321Z","logger":"controller.nodeclaim.lifecycle","message":"Selected instance type Standard_D8ls_v5","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"INFO","time":"2024-06-23T13:32:19.322Z","logger":"controller.nodeclaim.lifecycle","message":"Resolved image /CommunityGalleries/AKSUbuntu-38d80f77-467a-481f-a8d4-09b6d4220bd2/images/2204gen2containerd/versions/microsoft-aks:aks-aez:aks-ubuntu-containerd-2204-gen2-2023-q2:2023.04.10 for instance type Standard_D8ls_v5","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:19.323Z","logger":"controller.nodeclaim.lifecycle","message":"Returning 2 IPv4 backend pools: [/subscriptions/4b-XXXXX-XXXXXXXXX-XXXXXXXXXX/resourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Network/loadBalancers/kubernetes/backendAddressPools/aksOutboundBackendPool /subscriptions/4bXXXXXXXXXXXXXXXX-XXXXXXXX-XXXXXXXXX/resourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Network/loadBalancers/kubernetes/backendAddressPools/kubernetes]","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:19.323Z","logger":"controller.nodeclaim.lifecycle","message":"Creating network interface aks-general-purpose-87qcx","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:19.929Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"DEBUG","time":"2024-06-23T13:32:19.956Z","logger":"controller.nodeclaim.lifecycle","message":"Successfully created network interface: /subscriptions/4bXXXXXX-XXXXXXXX-XXXXXXX/resourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Network/networkInterfaces/aks-general-purpose-87qcx","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:19.956Z","logger":"controller.nodeclaim.lifecycle","message":"Creating virtual machine aks-general-purpose-87qcx (Standard_D8ls_v5)","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:20.930Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"DEBUG","time":"2024-06-23T13:32:21.931Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"DEBUG","time":"2024-06-23T13:32:22.219Z","logger":"controller.provisioner","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"ERROR","time":"2024-06-23T13:32:22.635Z","logger":"controller.nodeclaim.lifecycle","message":"Creating virtual machine \"aks-general-purpose-87qcx\" failed: PUT https://management.azure.com/subscriptions/4bXXXXXXX_XXXXXXXX_XXXXX/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Compute/virtualMachines/aks-general-purpose-87qcx\n--------------------------------------------------------------------------------\nRESPONSE 404: 404 Not Found\nERROR CODE: GalleryImageNotFound\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"GalleryImageNotFound\",\n \"message\": \"\\"The gallery image /CommunityGalleries/AKSUbuntu-38d80f77-467a-481f-a8d4-09b6d4220bd2/images/2204gen2containerd/versions/microsoft-aks:aks-aez:aks-ubuntu-containerd-2204-gen2-2023-q2:2023.04.10 is not available in eastus region. Please contact image owner to replicate to this region, or change your requested region.\\"\",\n \"target\": \"imageReference\"\n }\n}\n--------------------------------------------------------------------------------\n","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"DEBUG","time":"2024-06-23T13:32:22.931Z","logger":"controller.disruption","message":"waiting on cluster sync","commit":"bbaa9b7"} {"level":"ERROR","time":"2024-06-23T13:32:23.161Z","logger":"controller.nodeclaim.lifecycle","message":"networkInterface.Delete for aks-general-purpose-87qcx failed: DELETE https://management.azure.com/subscriptions/4b-XXXXXXXXXX-XXXXXXXX/ResourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Network/networkInterfaces/aks-general-purpose-87qcx\n--------------------------------------------------------------------------------\nRESPONSE 400: 400 Bad Request\nERROR CODE: NicReservedForAnotherVm\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"NicReservedForAnotherVm\",\n \"message\": \"Nic(s) in request is reserved for another Virtual Machine for 180 seconds. Please provide another nic(s) or retry after 180 seconds. Reserved VM: /subscriptions/4b-XXXXXXX_XXXXXXX_XXXX/resourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Compute/virtualMachines/aks-general-purpose-87qcx\",\n \"details\": []\n }\n}\n--------------------------------------------------------------------------------\n","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"ERROR","time":"2024-06-23T13:32:23.161Z","logger":"controller.nodeclaim.lifecycle","message":"failed to cleanup resources for node claim general-purpose-87qcx, %!w(*errors.joinError=&{[0xc0019aae20]})","commit":"bbaa9b7","nodeclaim":"general-purpose-87qcx"} {"level":"ERROR","time":"2024-06-23T13:32:23.162Z","logger":"controller","message":"Reconciler error","commit":"bbaa9b7","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"general-purpose-87qcx"},"namespace":"","name":"general-purpose-87qcx","reconcileID":"0e1c39df-c71a-436a-aa24-dc6395651ae7","error":"launching nodeclaim, creating instance, virtualMachine.BeginCreateOrUpdate for VM \"aks-general-purpose-87qcx\" failed: PUT https://management.azure.com/subscriptions/4b-xxxxxxxx-xxxx/resourceGroups/MC_use-dt-stg-rg_testAKSfipsKarp_eastus/providers/Microsoft.Compute/virtualMachines/aks-general-purpose-87qcx\n--------------------------------------------------------------------------------\nRESPONSE 404: 404 Not Found\nERROR CODE: GalleryImageNotFound\n--------------------------------------------------------------------------------\n{\n \"error\": {\n \"code\": \"GalleryImageNotFound\",\n \"message\": \"\\"The gallery image /CommunityGalleries/AKSUbuntu-38d80f77-467a-481f-a8d4-09b6d4220bd2/images/2204gen2containerd/versions/microsoft-aks:aks-aez:aks-ubuntu-containerd-2204-gen2-2023-q2:2023.04.10 is not available in eastus region. Please contact image owner to replicate to this region, or change your requested region.\\"\",\n \"target\": \"imageReference\"\n }\n}\n--------------------------------------------------------------------------------\n"}

Community Note

chkp-ilyaro commented 5 months ago

Hi,

I found the problem limitation

https://github.com/Azure/karpenter-provider-azure/blob/5ee206b44978eeb537a1d08f347fda6742b5181f/pkg/apis/crds/karpenter.azure.com_aksnodeclasses.yaml#L50

Unsupported value: "Ubuntu2004": supported values: "Ubuntu2204", "AzureLinux" fips_ebabled node is only Ubuntu2004 but Karpenter for Azure doesn’t support it So Karpenter and fips_enabled nodes can’t work together currently

Fips_enabled imageversion path is here: /subscriptions/109a5e88-712a-48ae-9078-9ca8b3c81345/resourceGroups/AKS-Ubuntu/providers/Microsoft.Compute/galleries/AKSUbuntu/images/2004gen2fipscontainerd/versions/202405.27.0 The info can be taken from VMSS in Azure

Bryce-Soghigian commented 4 months ago

/subscriptions/109a5e88-712a-48ae-9078-9ca8b3c81345/resourceGroups/AKS-Ubuntu/providers/Microsoft.Compute/galleries/AKSUbuntu/images/2004gen2fipscontainerd/versions/202405.27.0 The info can be taken from VMSS in Azure

Also note the galleries you are sharing for image version, are for SIG, and not Community Image galleries which is what karpenter uses today. We do not publish the fips images for community image galleries.

Bryce-Soghigian commented 3 months ago

cc: @rakechill maybe something SIG gallery support can enable?