elastic / cloud-on-k8s

Elastic Cloud on Kubernetes
Other
55 stars 707 forks source link

Allow setting custom TLS certificates for fleet-server via values.yaml #6559

Open johnkm516 opened 1 year ago

johnkm516 commented 1 year ago

Proposal

I have been searching for days looking for a solution to my issue and am convinced the current ECK helm charts do not support my use case, or is at least not documented.

I am trying to deploy a self-managed ECK stack with fleet server and agents on my k8s cluster via the ECK helm charts. The only documentation I can find regarding setting up custom TLS certs for fleet-server and agents is here which assumes the user is installing / deploying the fleet server manually through command line, not via the operator. This is my current kibana config :

 spec:
    config:
      xpack.fleet.agents.elasticsearch.hosts: ["https://elasticsearch.example.com"]
      xpack.fleet.agents.fleet_server.hosts: ["https://fleet-server.example.com"]
      xpack.fleet.outputs:
      - id: fleet-default-output
        name: default
        type: elasticsearch
        hosts: [ https://elasticsearch.example.com ]
        # openssl x509 -fingerprint -sha256 -noout -in tls/kibana/elasticsearch-ca.pem (colons removed)
        ca_trusted_fingerprint: <my ca fingerprint>
        is_default: true
        is_default_monitoring: true
      xpack.fleet.packages:
      - name: system
        version: latest
      - name: elastic_agent
        version: latest
      - name: fleet_server
        version: latest
      - name: kubernetes
        version: latest
      - name: apm
        version: latest
      xpack.fleet.agentPolicies:
      - name: Fleet Server on ECK policy
        id: fleet-server
        namespace: default
        monitoring_enabled:
        - logs
        - metrics
        is_default_fleet_server: true
        package_policies:
        - name: fleet_server-1
          id: fleet_server-1
          package:
            name: fleet_server
      - name: Elastic Agent on ECK policy
        id: eck-agent
        namespace: default
        monitoring_enabled:
        - logs
        - metrics
        unenroll_timeout: 900
        is_default: true
        package_policies:
        - package:
            name: system
          name: system-1
        - package:
            name: kubernetes
          name: kubernetes-1
        - package:
            name: apm
          name: apm-1
          inputs:
            - type: apm
              enabled: true
              vars:
                - name: host
                  value: 0.0.0.0:8200

I added a template to create a Traefik IngressRoute that does TLS passthrough to fleet-server-agent-http :

---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  name: fleet-server-ingress
  namespace: {{ .Release.Namespace }}
spec:
  entryPoints: 
    - websecure
  tls:
    passthrough: true
  routes:                           # [2]
    - match: HostSNI(`fleet-server.example.com`) # [3]
      services:                       # [8]
      - name: fleet-server-agent-http
        port: https                     # [9]

The IngressRoute works, the problem is the certs. Fleet-server generates its own self-signed certs when I deploy the helm chart (and values.yaml of fleet-server in the current repository show no way of customizing TLS certs). Therefore the current fleet-agent certs do not contain the fleet-server.example.com domain and therefore fails to establish TLS. I have tried to manually update the certs by updating the secret fleet-server-agent-http-certs-internal which is mounted as a volume at /usr/share/fleet-server/config/http-certs, but the secret seems to be managed by the operator and is instantly regenerated to the self-signed certs when I attempt to delete it or update it in any way.

The use case of making the fleet server endpoint is obvious, it should have a publicly accessible endpoint so that I can enroll agents on machines outside the cluster.

If it's currently not possible to set custom certs for fleet-server via values.yaml this issue is a feature request for that. If it's already possible, this issue is a request to update the current values.yaml to show how that can be done, and an update on documentation as well.

pebrc commented 1 year ago

The eck-fleet-server Helm chart is just a thin wrapper around the Elastic Agent custom resource with some presets that are necessary to run Elastic Agent in Fleet mode. Because of that any customisation that is possible with the raw Elastic Agent resources is also possible through the Helm chart.

For example customising the self-signed certificates as documented here

# values.yml
version: 8.6.2 

spec:
  kibanaRef:
    name: my-kibana 
  elasticsearchRefs:
  - name: my-elasticsearch 
  http:
    tls:
      selfSignedCertificate:
        subjectAltNames:
        - dns: fleet-server.example.com

Similarly it is possible to completely move away from the self-signed certs and use your own certificates that you have to provide in form of a secret in the same namespace as documented here

# values.yml
version: 8.6.2 

spec:
  kibanaRef:
    name: my-kibana 
  elasticsearchRefs:
  - name: my-elasticsearch 
  http:
    tls:
      certificate:
        secretName: my-custom-cert

Finally you have the option to terminate TLS with Traefik and accept the self-signed certificates behind the proxy. We have an example here https://github.com/elastic/cloud-on-k8s/tree/main/config/recipes/traefik which is also linked in our documentation https://www.elastic.co/guide/en/cloud-on-k8s/2.6/k8s-recipes.html

I am leaving this issue open for us to expand the documentation of the Helm chart to better explain how you can derive a values file for the Helm charts from the existing documentation and how CRD spec and Helm values correspond to each other.

johnkm516 commented 1 year ago

@pebrc

Thank you for the response. What I chose to do is terminate with my own domain certificate at the Ingress and then use insecure verify server transport to ignore whether the ECK cert is trusted or not. However the issue that I still had is with the fleet server. I found a solution, but I do not know whether or not this is a bug.

For my deployment, I don't disable TLS or touch any of the TLS options in the helm chart. I tried setting my own certificates using the TLS config as you showed many times but I ended up with more issues, all of them either related to untrusted certificate authority, or domain not supported (ex. my cert supports *.example.com, I get TLS errors saying the cert does not support k8s intranet endpoints like https://elasticsearch-es-http.elastic-stack.svc:9200. So with my setup, ECK generates its own self signed certs, I add TLS to my Ingress endpoints, then use insecure verify to automatically trust ECK's generated CA.

With this setup, despite setting xpack.fleet.outputs.ca_trusted_fingerprint and setting the default output to https://elasticsearch.example.com fleet server enrolls and is "healthy" but fails to send any data to Elasticsearch. However, if I add both the kubernetes hostname and the public URL hostname, fleet server will send data successfully. What's weird is that without me changing any of the output settings, the logs show the agents connect to the public Elasticsearch endpoint successfully :

2023-03-28T08:34:54.693289364+09:00 stderr F {"log.level":"info","@timestamp":"2023-03-27T23:34:54.692Z","message":"Connection to backoff(elasticsearch(https://elasticsearch.example.com:443)) established","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log.logger":"publisher_pipeline_output","log.origin":{"file.line":147,"file.name":"pipeline/client_worker.go"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}

proving that my output settings and fingerprint are indeed correct. Yet the moment I remove https://elasticsearch-es-http.elastic-stack.svc:9200 from xpack.fleet.agents.elasticsearch.hosts, the agents will show "healthy" but won't output any data to Elasticsearch, and will show no logs.

eck-kibana:
  enabled: true
  annotations:
    eck.k8s.elastic.co/license: basic
  metadata:
    annotations:
      eck.k8s.elastic.co/license: basic
  # Name of the Kibana instance.
  #
  fullnameOverride: kibana

  spec:
    # Reference to ECK-managed Elasticsearch instance, ideally from {{ "elasticsearch.fullname" }}
    #
    elasticsearchRef:
      name: elasticsearch
    enterpriseSearchRef:
      name: enterprise-search
    http:
      service:
        spec:
          # Type of service to deploy for Kibana.
          # This deploys a load balancer in a cloud service provider, where supported.
          # 
          type: LoadBalancer

    config:
      # Note that these are specific to the namespace into which this example is installed, and are
      # using `elastic-stack` as configured here and detailed in the README when installing:
      #
      # `helm install es-kb-quickstart elastic/eck-stack -n elastic-stack`
      #
      # If installed outside of the `elastic-stack` namespace, the following 2 lines need modification.
      xpack.fleet.agents.elasticsearch.hosts: ["https://elasticsearch-es-http.elastic-stack.svc:9200", "https://elasticsearch.example.com"] 
      xpack.fleet.agents.fleet_server.hosts: ["https://fleet-server.example.com"]
      xpack.fleet.outputs:
      - id: fleet-default-output
        name: default
        type: elasticsearch
        hosts: [ "https://elasticsearch.example.com" ]
        # openssl x509 -fingerprint -sha256 -noout -in tls/kibana/elasticsearch-ca.pem (colons removed)
        is_default: true
        is_default_monitoring: true
        ca_trusted_fingerprint: "<my fingerprint>"
      xpack.fleet.packages:
      - name: system
        version: latest
      - name: elastic_agent
        version: latest
      - name: fleet_server
        version: latest
      - name: kubernetes
        version: latest
      - name: apm
        version: latest
      xpack.fleet.agentPolicies:
      - name: Fleet Server on ECK policy
        id: fleet-server
        namespace: default
        monitoring_enabled:
        - logs
        - metrics
        is_default_fleet_server: true
        package_policies:
        - name: fleet_server-1
          id: fleet_server-1
          package:
            name: fleet_server
      - name: Elastic Agent on ECK policy
        id: eck-agent
        namespace: default
        monitoring_enabled:
        - logs
        - metrics
        unenroll_timeout: 900
        is_default: true
        package_policies:
        - package:
            name: system
          name: system-1
        - package:
            name: kubernetes
          name: kubernetes-1
        - package:
            name: apm
          name: apm-1
          inputs:
            - type: apm
              enabled: true
              vars:
                - name: host
                  value: 0.0.0.0:8200

I created a new policy for outside the K8S cluster to install an agent on my Windows PC, and it also successfully sends data to Elasticsearch. I think this is a bug? I'm not sure what is causing this behavior.

johnkm516 commented 1 year ago

@pebrc

Never mind... When I used

          ssl:
            certificate_authorities:
            - |
              -----BEGIN CERTIFICATE-----

in the xpack.fleet.outputs instead of the xpack.fleet.outputs.ca_trusted_fingerprint it worked. I don't know why my fingerprint didn't work though? I deleted all the colons, and used the same certificate that I used in the above snippet as the target for openssl x509 -fingerprint -sha256 -noout -in.