HTTPS listener created as HTTP

smarsh-tim commented 2 years ago

Describe the bug A HTTPS is being created an HTTP. This occurs at some point after initial deploy, as when the first deployed the listener is on HTTPS correctly.

2022-03-22 15:05:56 diagd 2.0.4 [P22TAEW] INFO: EnvoyConfig: Generating V3
2022-03-22 15:05:56 diagd 2.0.4 [P22TAEW] INFO: V3Listener: ==== GENERATED <V3Listener HTTP emissary-ingress-http-listener on 0.0.0.0:8080 [XFP]>
2022-03-22 15:05:56 diagd 2.0.4 [P22TAEW] INFO: V3Listener: ==== GENERATED <V3Listener HTTP emissary-ingress-https-listener on 0.0.0.0:8443 [XFP]>

This ultimately leads to this error when attempting to connect to the HTTPS endpoint:

% curl https://sample-site.internal.com/ambassador/v0/check_ready
curl: (35) error:1400410B:SSL routines:CONNECT_CR_SRVR_HELLO:wrong version number

But then this works fine:

% curl http://sample-site.internal.com:443/ambassador/v0/check_ready
Ambassador is ready and waiting

I am unable to find any related error messaging in any of the emissary pod logging, or resource describes, or k8s events.

These are the listener resources:

apiVersion: getambassador.io/v3alpha1
kind: Listener
metadata:
  name: emissary-ingress-http-listener
  namespace: emissary
spec:
  hostBinding:
    namespace:
      from: ALL
  port: 8080
  protocol: HTTP
  securityModel: XFP
  ambassador_id: [ "internal" ]

---
apiVersion: getambassador.io/v3alpha1
kind: Listener
metadata:
  name: emissary-ingress-https-listener
  namespace: emissary
spec:
  hostBinding:
    namespace:
      from: ALL
  port: 8443
  protocol: HTTPS
  securityModel: XFP
  ambassador_id: [ "internal" ]

But the protocol for emissary-ingress-https-listener isn't being respected, and instead is generating as HTTP instead of HTTPS. There are no TLS errors being reported in emissary pod logging as described would be here: https://www.getambassador.io/docs/emissary/latest/topics/running/tls/#certificates-and-secrets

I am able to get it back to listening on HTTPS, but only by deleting each of Listeners, HelmRelease, Mappings, and TLSContext for emissary, then re-applying the resource manifests.

It's not the TLS certificate stored as a secret, since that doesn't change and it starts working again after re-apply. And it's not RBAC permissions either, since those don't change during any of this. With that I'm confident I can rule-out cert-manager or the acme provider.

To Reproduce Steps to reproduce the behavior:

Deploy a emissary 2.0.4 using helm
Observe HTTPS listener created successfully
At some point - it may be after the initial TLS certificate rotation occurs - the HTTPS listener switches to HTTP

Expected behavior A clear and concise description of what you expected to happen.

Versions (please complete the following information):

Emissary: 2.0.4
Kubernetes environment: 1.21

Additional context Here are the other components for TLS:

---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: sample-site.internal.com
  namespace: cert-manager
spec:
  dnsNames:
    - sample-site.internal.com
  duration: 168h # 7d
  secretName: ambassador-cert-internal
  issuerRef:
    name: acme
    kind: ClusterIssuer
  renewBefore: 72h # 3d

---
apiVersion: getambassador.io/v3alpha1
kind: Host
metadata:
  name: sample-site.internal.com
  namespace: cert-manager
spec:
  hostname: sample-site.internal.com
  acmeProvider:
    authority: none
  tlsSecret:
    name: ambassador-cert-internal
  tlsContext:
    name: tls-context-internal
  ambassador_id: [ "internal" ]

% kubectl get Certificate -n cert-manager
NAME                        READY   SECRET                     AGE
sample-site.internal.com   True    ambassador-cert-internal   4d20h

smarsh-tim commented 2 years ago

I tracked down the issue. I would like to see some additional logging in emissary pod logs when it switches back to HTTP even though it's defined as HTTPS.

Ultimately this issue seems to have been a cascaded problem from cert-manager - where an upgrade neglected to update the RBAC permissions: https://github.com/aws/eks-anywhere/issues/1572

Somehow, the certificate was still reporting as Ready. I'm not sure on the root-cause relationship of what specifically cascaded to breaking emissary from using that self-reporting Ready Certificate. But once I resolved cert-manager's RBAC problems this issue went away and emissary is now listening on HTTPS again.

If we could convert this into a feature request for additional logging - it would save a lot of time for future debugging.

alexgervais commented 2 years ago

Thanks a lot for the thorough investigation @smarsh-tim! To your suggestion, I'll label this issue as a feature request for added logs.

smarsh-tim commented 2 years ago

I did some further testing today, and now my endpoint is still showing two HTTP listeners created in the logging. However, HTTPS requests are resolving inconsistently.

Some times the request goes through successfully, other times I get this still:

curl: (35) error:1400410B:SSL routines:CONNECT_CR_SRVR_HELLO:wrong version number

No changes have occurred to the TLS certificate. I'm wondering if it's somehow conflicting with the other ambassador_id load balancers I have in the same Kubernetes cluster. Currently I'm running 4 distinct emissary load balancers, 3 using labelled ambassador_id values and one without as the default. All associated resources for the labelled load balancers have their own ambassador_id values set too.

Very curious transient issue. I will keep digging.

smarsh-tim commented 2 years ago

I think I found a root cause. I have multiple emissary-ingress instances, and two of them had incidentally grabbed the same External IP with kube-vip. Took a while to find since I wasn't looking for that particular.

kubectl get service -n emissary

Then check if any of the EXTERNAL-IP are the same. They shouldn't be.

dmaclaury commented 5 months ago

In 3.9.1 I continue to see this behavior where a listener defined with protocol: HTTPS is not listening for HTTPS, but rather HTTP.

emissary-ingress / emissary

HTTPS listener created as HTTP #4171