Azure / application-gateway-kubernetes-ingress

This is an ingress controller that can be run on Azure Kubernetes Service (AKS) to allow an Azure Application Gateway to act as the ingress for an AKS cluster.
https://azure.github.io/application-gateway-kubernetes-ingress
MIT License
678 stars 423 forks source link

App Gateway sslcert not deleted during terraform destroy (also max 100 sslcerts) #988

Open jimusbobus opened 4 years ago

jimusbobus commented 4 years ago

Describe the bug We use Helm and Terraform to deploy services into AKS, it seems when we 'terraform destroy' a deployed service the sslcert data that was applied to the App Gateway is not deleted.

To Reproduce Deployed service with an ingress controller, here is the ingress.yaml that is used as part of the deployment

kubectl get ingress -n auto-terratestjdr1 -o yaml

`apiVersion: v1 items:

Verify that the cert has been installed

az network application-gateway ssl-cert list -g bnngm-dev-westeurope --gateway-name bnngm-dev-westeurope-aks-appgw --subscription blah-blah | jq

{ "data": null, "etag": "W/\"cf3dcf6a-0fe4-4038-a59a-428923b3362e\"", "id": "/subscriptions/4b7cd783-c55a-4319-a0d7-a3a68ef112b1/resourceGroups/bnngm-dev-westeurope/providers/Microsoft.Network/applicationGateways/bnngm-dev-westeurope-aks-appgw/sslCertificates/auto-terratestjdr1-star-dev-bnngm-azure-cudaops-com", "keyVaultSecretId": null, "name": "auto-terratestjdr1-star-dev-bnngm-azure-cudaops-com", "password": null, "provisioningState": "Succeeded", "publicCertData": "FAKEDATA", "resourceGroup": "bnngm-dev-westeurope", "type": "Microsoft.Network/applicationGateways/sslCertificates" }

Destroy the deployment and check for the cert again; az network application-gateway ssl-cert list -g bnngm-dev-westeurope --gateway-name bnngm-dev-westeurope-aks-appgw --subscription blah-blah | jq

[ { "data": null, "etag": "W/\"738c1667-1f66-4560-ba3e-b28ae25b4460\"", "id": "/subscriptions/4b7cd783-c55a-4319-a0d7-a3a68ef112b1/resourceGroups/bnngm-dev-westeurope/providers/Microsoft.Network/applicationGateways/bnngm-dev-westeurope-aks-appgw/sslCertificates/auto-terratestjdr1-star-dev-bnngm-azure-cudaops-com", "keyVaultSecretId": null, "name": "auto-terratestjdr1-star-dev-bnngm-azure-cudaops-com", "password": null, "provisioningState": "Succeeded", "publicCertData": "FAKEIT", "resourceGroup": "bnngm-dev-westeurope", "type": "Microsoft.Network/applicationGateways/sslCertificates" } ]

The sslcert is not removed with the rest of the deployment.
Once we hit 100 deployments then we cannot deploy anymore.

Ingress Controller details kubectl describe pod -n ingress-azure ingress-azure-69c7c9d66d-sjgkl

Name: ingress-azure-69c7c9d66d-sjgkl Namespace: ingress-azure Priority: 0 Node: aks-agentpool-18118092-vmss000000/10.153.40.4 Start Time: Mon, 27 Jul 2020 21:21:10 +0100 Labels: aadpodidbinding=ingress-azure app=ingress-azure pod-template-hash=69c7c9d66d release=ingress-azure Annotations: checksum/config: 92368a0b96949a0f3e010d229a45caa26fea001a744095ee3f780595cc508397 kubectl.kubernetes.io/restartedAt: 2020-07-15T11:10:45+01:00 prometheus.io/port: 8123 prometheus.io/scrape: true Status: Running IP: 10.153.40.9 IPs: IP: 10.153.40.9 Controlled By: ReplicaSet/ingress-azure-69c7c9d66d Containers: ingress-azure: Container ID: docker://cffae75aa13e593b9d1422ac3c1aa8ea991768bd607af7b8e29fdf8467178e91 Image: mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.2.0 Image ID: docker-pullable://mcr.microsoft.com/azure-application-gateway/kubernetes-ingress@sha256:de458f962eab0cd2de19d23dfeb9a0e4bc2565a38f8c45cc98a74f3cda8b940c Port: Host Port: State: Running Started: Mon, 27 Jul 2020 21:21:23 +0100 Ready: True Restart Count: 0 Liveness: http-get http://:8123/health/alive delay=15s timeout=1s period=20s #success=1 #failure=3 Readiness: http-get http://:8123/health/ready delay=5s timeout=1s period=10s #success=1 #failure=3 Environment Variables from: ingress-azure ConfigMap Optional: false Environment: AZURE_CLOUD_PROVIDER_LOCATION: /etc/appgw/azure.json AGIC_POD_NAME: ingress-azure-69c7c9d66d-sjgkl (v1:metadata.name) AGIC_POD_NAMESPACE: ingress-azure (v1:metadata.namespace) Mounts: /etc/appgw/azure.json from azure (rw) /var/run/secrets/kubernetes.io/serviceaccount from ingress-azure-token-dhdlh (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: azure: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/azure.json HostPathType: File ingress-azure-token-dhdlh: Type: Secret (a volume populated by a Secret) SecretName: ingress-azure-token-dhdlh Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events:

terraform -version Terraform v0.12.29

nboeckmann commented 4 years ago

Hi,

I'm experiencing the same issue. I don't think that it classifies as an improvement but must be treated as a bug as the ingress controller simply stops working in my environment after I've added/deleted 100 ingress definitions. I think it's important that the ingress controller cleans up after itself and when it deletes a listener it also has to delete the associated SSL certificate.

Nils

chreichert commented 3 years ago

Any news on this issue. I would also agree, that this classifies as bug. Are there any plans, when this will be fixed?

gregory-j-baker commented 1 year ago

@akshaysngupta -- are you able to comment on this?