Azure / AKS-Construction

Accelerate your onboarding to AKS with; Helper Web App, bicep templating and CI/CD samples. Flexible & secure AKS baseline implementations in a Microsoft + community maintained reference implementation.
https://azure.github.io/AKS-Construction/
MIT License
356 stars 158 forks source link

Flux installation fails when Baseline azure policy is enabled #499

Closed pratiksharma-dev closed 1 year ago

pratiksharma-dev commented 1 year ago
Gordonby commented 1 year ago

Can you advise on the sequence of creation? Are you literally running the deployment commands on an empty resource group?

Azure Policy usually takes about 15 minutes to become active - so it sounds strange that Flux install fails because of azure policy.

eorengoey commented 1 year ago

Hi @Gordonby

We tried the following commands but the flux installation failed after 2 hours because of timeout:

# Create Resource Group
az group create -l EastUS  -n [RGName]

# Deploy template with in-line parameters
az deployment group create -g [RGname]  --template-uri https://github.com/Azure/AKS-Construction/releases/download/0.9.7/main.json --parameters \
    resourceName=[AKSname] \
    AksPaidSkuForSLA=true \
    agentVMSize=Standard_D4ds_v4 \
    agentCountMax=5 \
    osDiskType=Managed \
    byoAKSSubnetId=[mySubnetID] \
    enable_aad=true \
    AksDisableLocalAccounts=true \
    enableAzureRBAC=true \
    adminPrincipalId=$(az ad signed-in-user show --query id --out tsv) \
    omsagent=true \
    retentionInDays=30 \
    openServiceMeshAddon=true \
    azurepolicy=deny \
    keyVaultAksCSI=true \
    keyVaultCreate=true \
    keyVaultOfficerRolePrincipalId=$(az ad signed-in-user show --query id --out tsv) \
    fluxGitOpsAddon=true \
    networkPluginMode=Overlay \
    ebpfDataplane=cilium

flux reousrce (Microsoft.KubernetesConfiguration/extensions) timeout status message:

{
    "status": "Failed",
    "error": {
        "code": "ResourceDeploymentFailure",
        "message": "The resource provision operation did not complete within the allowed timeout period."
    }
}

Then, after a couple attempts we decided to remove flux from the script (removingfluxGitOpsAddon=true) and deploy it once the cluster was up & running. That would allow me to monitor from the cluster. To perform it, I used the following command:

az k8s-extension create --resource-group [RGName] --cluster-name [myAKSname] --cluster-type managedClusters --name flux --extension-type microsoft.flux --config image-automation-controller.enabled=true image-reflector-controller.enabled=true

During the execution I noticed the extension triggered some events. The fluxconfig-agent and fluxconfig-controller deployment failed due one of the Gatekeeper constraint (see @pratiksharma-dev messages).

Hope this help.

pratiksharma-dev commented 1 year ago

Hi @eorengoey,

I deployed with baseline policy enabled with GitOps addon, and was able to successfully deploy the cluster, here is my az cli command:

az deployment group create -g xxxxx --template-uri https://github.com/Azure/AKS-Construction/releases/download/0.9.7/main.json --parameters \ resourceName=xxxxxxx \ upgradeChannel=stable \ agentCountMax=20 \ custom_vnet=true \ vnetAksSubnetAddressPrefix=xxxxxxx \ enable_aad=true \ AksDisableLocalAccounts=true \ enableAzureRBAC=true \ adminPrincipalId=$(az ad signed-in-user show --query id --out tsv) \ registries_sku=Premium \ acrPushRolePrincipalId=$(az ad signed-in-user show --query id --out tsv) \ imageNames="[\"k8s.gcr.io/external-dns/external-dns:v0.11.0\"]" \ privateLinks=true \ keyVaultIPAllowlist="[\"x.x.x.x\"]" \ azurepolicy=deny \ enablePrivateCluster=true \ fileCSIDriver=false \ diskCSIDriver=false \ dnsZoneId=/xxxxxxxx/xxxxxxx \ keyVaultAksCSI=true \ keyVaultCreate=true \ keyVaultOfficerRolePrincipalId=$(az ad signed-in-user show --query id --out tsv) \ fluxGitOpsAddon=true \ networkPluginMode=Overlay \ ebpfDataplane=cilium

az k8s-configuration flux create -g xxxxxx \ -c xxxxx \ -n cluster-config \ --namespace cluster-config \ --scope cluster \ -t managedClusters \ -u git@github.com:xxxxxx/gitops-flux2-kustomize-helm-mt.git \ --ssh-private-key-file /mnt/c/Users/prashar/.ssh/id_rsa \ --branch main \ --kustomization name=infra path=./infrastructure prune=true \ --kustomization name=apps path=./apps/staging prune=true dependsOn=["infra"]

image image