Azure / application-gateway-kubernetes-ingress

This is an ingress controller that can be run on Azure Kubernetes Service (AKS) to allow an Azure Application Gateway to act as the ingress for an AKS cluster.
https://azure.github.io/application-gateway-kubernetes-ingress
MIT License
678 stars 422 forks source link

AGIC with Lets encrypt sometimes serves old SSL certificate (sometimes, sporadically) #1039

Closed joelharkes closed 5 months ago

joelharkes commented 4 years ago

Please don't spend much time debugging this but i want to know if this is a known issue?

Describe the bug Sometimes an enduser is served an old ssl certificate (way older) image

To Reproduce Steps to reproduce the behavior:

Cert manager v0.15.1

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-app-gateway
  namespace: prod
  annotations:
    kubernetes.io/ingress.class: azure/application-gateway
    appgw.ingress.kubernetes.io/ssl-redirect: "true"
    appgw.ingress.kubernetes.io/request-timeout: "300"
    appgw.ingress.kubernetes.io/connection-draining: "true"
    appgw.ingress.kubernetes.io/connection-draining-timeout: "30"
    cert-manager.io/cluster-issuer: issuer-letsencrypt-prod
    cert-manager.io/acme-challenge-type: http01
spec:
  tls:
    - hosts:
        - domain.me
      secretName: secret-ssl-domain-me
    - hosts:
        - customer1.domain.me
      secretName: secret-ssl-customer1-domain-me
    - hosts:
        - customer2.domain.me
      secretName: secret-ssl-customer2-domain-me
    - hosts:
        - customer3.domain.me
      secretName: secret-ssl-customer3-domain-me
    - hosts:
        - customer4.domain.me
      secretName: secret-ssl-customer4-domain-me
  rules:
    - host: domain.me
      http:
        paths:
          - backend:
              serviceName: service-php-prod
              servicePort: 80
    - host: customer1.domain.me
      http:
        paths:
          - backend:
              serviceName: service-php-prod
              servicePort: 80
    - host: customer2.domain.me
      http:
        paths:
          - backend:
              serviceName: service-php-prod
              servicePort: 80
    - host: customer3.domain.me
      http:
        paths:
          - backend:
              serviceName: service-php-prod
              servicePort: 80
    - host: customer4.domain.me
      http:
        paths:
          - backend:
              serviceName: service-php-prod
              servicePort: 80

Ingress Controller details


Name:         ingress-azure-64b95964d8-ndnvf
Namespace:    default
Priority:     0
Node:         aks-nodepool1-30476719-vmss000001/192.168.1.5
Start Time:   Wed, 23 Sep 2020 10:29:54 +0200
Labels:       aadpodidbinding=ingress-azure
              app=ingress-azure
              pod-template-hash=64b95964d8
              release=ingress-azure
Annotations:  checksum/config: d13d8bd8adaf32da021553a8cb42d3f750cd00fba8c1eb09012aed162268257d
              prometheus.io/port: 8123
              prometheus.io/scrape: true
Status:       Running
IP:           10.244.1.5
IPs:
  IP:           10.244.1.5
Controlled By:  ReplicaSet/ingress-azure-64b95964d8
Containers:
  ingress-azure:
    Container ID:   docker://116e54b531921f64384a0380541934b23b3b4ac75c1cb3e69b50cdc5b62ff7cb
    Image:          mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.2.0
    Image ID:       docker-pullable://mcr.microsoft.com/azure-application-gateway/kubernetes-ingress@sha256:de458f962eab0cd2de19d23dfeb9a0e4bc2565a38f8c45cc98a74f3cda8b940c
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Wed, 23 Sep 2020 10:30:35 +0200
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:8123/health/alive delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:      http-get http://:8123/health/ready delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment Variables from:
      ingress-azure  ConfigMap  Optional: false
    Environment:
      AZURE_CLOUD_PROVIDER_LOCATION:  /etc/appgw/azure.json
      AGIC_POD_NAME:                  ingress-azure-64b95964d8-ndnvf (v1:metadata.name)
      AGIC_POD_NAMESPACE:             default (v1:metadata.namespace)
    Mounts:
      /etc/appgw/azure.json from azure (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from ingress-azure-token-xfkxt (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  azure:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/azure.json
    HostPathType:  File
  ingress-azure-token-xfkxt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  ingress-azure-token-xfkxt
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>
akshaysngupta commented 3 years ago

@joelharkes are you still seeing this problem ? Do you AGIC updating the AppGateway with old certificate when old certificate is served ?

joelharkes commented 3 years ago

Thanks for the update.

Haven't checked app gateway I will next time.

I think it happens around certificate update. (Maybe it reverts to previous certificate for few seconds) or something?

Normally it lasts only a few seconds when our users refresh it's gone.

Haven't heard a new report last 2 weeks but before then we heard it quite a few times and experienced it ourselves. (Our app only has 10.000 infrequent users currently)

I think it could also happen after I update the ingress file to add a new customer.

joelharkes commented 3 years ago

@akshaysngupta We just have had this issue today again multiple times.

it might seem to happen on updating ingress yml files. for context we have 3 different ingress yml files in 3 different namespaces equal to the one above but just with different sub-domains. (yes each domain is unique, I double checked this).

how can i check the certificate in app gateway? i see its setup but i only get a name, eg: test-secret-sss-customer-domain-me nothing more.

akshaysngupta commented 3 years ago

Use the following command to view the certificate in text using openssl.

resourceGroup=""
gatewayName=""
sslCertName=""
publiccert=$(az network application-gateway ssl-cert show -g $resourceGroup --gateway-name $gatewayName --name $sslCertName --query publicCertData -o tsv)
echo -e "-----BEGIN CERTIFICATE-----\n$publiccert\n-----END CERTIFICATE-----" | openssl pkcs7 -print_certs | openssl x509 -noout

Can you also check the k8s secret when this happens ?

joelharkes commented 3 years ago

crazy enough it's a very old certificate (i think it's the first certificate ever requested). It keeps coming back, either when we change the configuration or when a renewal has to be done.

image

ramazankilimci commented 1 year ago

I'm facing the same issue. Is there any workaround for this?

joelharkes commented 1 year ago

we don't seem to have this problem anymore. somehow it was fixed.

We did have some wrong IPv6 DNS records. but im not sure anymore if this was also the impact here.