elastic / cloud-on-k8s

Elastic Cloud on Kubernetes
Other
55 stars 707 forks source link

Document edge cases around Fleet/Agent setups with Ingress #6853

Open pebrc opened 1 year ago

pebrc commented 1 year ago

When using Elastic Agent with Fleet in a mixed setup behind an Ingress with public CA, there are a few non-obvious gotchas that should be documented:

Mixed setup : some agents reside inside the cluster some agents outside and access Fleet server through the ingress

The agents inside the cluster will be unable to connect to Fleet server due to the Agents in the cluster being configured with with the self-signed certificate generated by ECK. The FLEET_URL for the agents will however after enrolment (where the value set from ECK is used) be changed to the URL of the Ingress and connection will go through the Ingress. However due the self-signed certificates form ECK replacing OS-level trust in the public CA this connection will not be successfully established.

A workaround is to set FLEET_CA to an empty string to allow the public CA to be trusted. This might however cause problems when enrolling.

lduvnjak commented 9 months ago

We're having issues trying to expose our fleet server over ingress-nginx. When testing with an http rule at path / both inside and outside agents connect without issues.

When trying to expose it on a custom URL like /fleet-server we're getting a 404 after enrollment.

Here's the Ingress configuration:

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: fleet-ingress
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
    #nginx.ingress.kubernetes.io/ssl-redirect: "true"
    #nginx.ingress.kubernetes.io/use-regex: "true"
    nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
  ingressClassName: nginx
  rules:
  - http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: fleet-server-agent-http
            port:
              number: 8220

I've also updated the Fleet server in Kibana settings:

image

When testing enrollment with an outside agent it connects, and gets stuck on updating. sudo ./elastic-agent install --url=https://k8s-lb....:443/fleet-server --enrollment-token=R0JVMm9vMEJqYlVoSlNTX0VQRHU6RjkyNDlLTnVReXFKZlBzYmhyRG04QQ== --insecure

Everything points to the fact that agents in general do not support having the URL at a custom path other than /. Have you managed to get this working or do we need to load balance based on the requested domain?

Here's the elastic-agent error and status output:

[elastic_agent][error] ack retrier: commit failed with error: acknowledge 1 actions '[action_id: policy:d7bde667-4962-46a5-af0b-57b81bf48ae3:3:1, type: POLICY_CHANGE]' for elastic-agent '56c65b93-daf0-4d12-b651-6e7b05b05e7c' failed: fail to decode ack response: invalid character '<' looking for beginning of value

# elastic-agent status
┌─ fleet
│  └─ status: (FAILED) could not decode the response, raw response: <html>
│     <head><title>404 Not Found</title></head>
│     <body>
│     <center><h1>404 Not Found</h1></center>
│     <hr><center>nginx</center>
│     </body>
│     </html>
│
└─ elastic-agent
   └─ status: (HEALTHY) Running

Here's the ingress we're trying to make work, alongside a test that shows it should rewrite correctly:

  rules:
  - http:
      paths:
      - backend:
          service:
            name: fleet-server-agent-http
            port:
              number: 8220
        path: /fleet-server(/|$)(.*)
        pathType: ImplementationSpecific

# curl -k https://k8s-lb..../fleet-server/api/status
{"name":"fleet-server","status":"HEALTHY"}
lduvnjak commented 9 months ago

After testing with a wildcard A record pointing to a LB, and using ingress like example below it works without issues:

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: fleet-ingress
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "360"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "360"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "360"
spec:
  ingressClassName: nginx
  rules:
  - host: fleet.lb.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: fleet-server-agent-http
            port:
              number: 8220

Does that mean that fleet is not supported behind a custom path as we suspected? It would be nice to have this in the docs as well