Open johnkm516 opened 1 year ago
The eck-fleet-server
Helm chart is just a thin wrapper around the Elastic Agent custom resource with some presets that are necessary to run Elastic Agent in Fleet mode. Because of that any customisation that is possible with the raw Elastic Agent resources is also possible through the Helm chart.
For example customising the self-signed certificates as documented here
# values.yml
version: 8.6.2
spec:
kibanaRef:
name: my-kibana
elasticsearchRefs:
- name: my-elasticsearch
http:
tls:
selfSignedCertificate:
subjectAltNames:
- dns: fleet-server.example.com
Similarly it is possible to completely move away from the self-signed certs and use your own certificates that you have to provide in form of a secret in the same namespace as documented here
# values.yml
version: 8.6.2
spec:
kibanaRef:
name: my-kibana
elasticsearchRefs:
- name: my-elasticsearch
http:
tls:
certificate:
secretName: my-custom-cert
Finally you have the option to terminate TLS with Traefik and accept the self-signed certificates behind the proxy. We have an example here https://github.com/elastic/cloud-on-k8s/tree/main/config/recipes/traefik which is also linked in our documentation https://www.elastic.co/guide/en/cloud-on-k8s/2.6/k8s-recipes.html
I am leaving this issue open for us to expand the documentation of the Helm chart to better explain how you can derive a values file for the Helm charts from the existing documentation and how CRD spec and Helm values correspond to each other.
@pebrc
Thank you for the response. What I chose to do is terminate with my own domain certificate at the Ingress and then use insecure verify server transport to ignore whether the ECK cert is trusted or not. However the issue that I still had is with the fleet server. I found a solution, but I do not know whether or not this is a bug.
For my deployment, I don't disable TLS or touch any of the TLS options in the helm chart. I tried setting my own certificates using the TLS config as you showed many times but I ended up with more issues, all of them either related to untrusted certificate authority, or domain not supported (ex. my cert supports *.example.com, I get TLS errors saying the cert does not support k8s intranet endpoints like https://elasticsearch-es-http.elastic-stack.svc:9200
. So with my setup, ECK generates its own self signed certs, I add TLS to my Ingress endpoints, then use insecure verify to automatically trust ECK's generated CA.
With this setup, despite setting xpack.fleet.outputs.ca_trusted_fingerprint
and setting the default output to https://elasticsearch.example.com
fleet server enrolls and is "healthy" but fails to send any data to Elasticsearch.
However, if I add both the kubernetes hostname and the public URL hostname, fleet server will send data successfully. What's weird is that without me changing any of the output settings, the logs show the agents connect to the public Elasticsearch endpoint successfully :
2023-03-28T08:34:54.693289364+09:00 stderr F {"log.level":"info","@timestamp":"2023-03-27T23:34:54.692Z","message":"Connection to backoff(elasticsearch(https://elasticsearch.example.com:443)) established","component":{"binary":"metricbeat","dataset":"elastic_agent.metricbeat","id":"beat/metrics-monitoring","type":"beat/metrics"},"log.logger":"publisher_pipeline_output","log.origin":{"file.line":147,"file.name":"pipeline/client_worker.go"},"service.name":"metricbeat","ecs.version":"1.6.0","ecs.version":"1.6.0"}
proving that my output settings and fingerprint are indeed correct.
Yet the moment I remove https://elasticsearch-es-http.elastic-stack.svc:9200
from xpack.fleet.agents.elasticsearch.hosts
, the agents will show "healthy" but won't output any data to Elasticsearch, and will show no logs.
eck-kibana:
enabled: true
annotations:
eck.k8s.elastic.co/license: basic
metadata:
annotations:
eck.k8s.elastic.co/license: basic
# Name of the Kibana instance.
#
fullnameOverride: kibana
spec:
# Reference to ECK-managed Elasticsearch instance, ideally from {{ "elasticsearch.fullname" }}
#
elasticsearchRef:
name: elasticsearch
enterpriseSearchRef:
name: enterprise-search
http:
service:
spec:
# Type of service to deploy for Kibana.
# This deploys a load balancer in a cloud service provider, where supported.
#
type: LoadBalancer
config:
# Note that these are specific to the namespace into which this example is installed, and are
# using `elastic-stack` as configured here and detailed in the README when installing:
#
# `helm install es-kb-quickstart elastic/eck-stack -n elastic-stack`
#
# If installed outside of the `elastic-stack` namespace, the following 2 lines need modification.
xpack.fleet.agents.elasticsearch.hosts: ["https://elasticsearch-es-http.elastic-stack.svc:9200", "https://elasticsearch.example.com"]
xpack.fleet.agents.fleet_server.hosts: ["https://fleet-server.example.com"]
xpack.fleet.outputs:
- id: fleet-default-output
name: default
type: elasticsearch
hosts: [ "https://elasticsearch.example.com" ]
# openssl x509 -fingerprint -sha256 -noout -in tls/kibana/elasticsearch-ca.pem (colons removed)
is_default: true
is_default_monitoring: true
ca_trusted_fingerprint: "<my fingerprint>"
xpack.fleet.packages:
- name: system
version: latest
- name: elastic_agent
version: latest
- name: fleet_server
version: latest
- name: kubernetes
version: latest
- name: apm
version: latest
xpack.fleet.agentPolicies:
- name: Fleet Server on ECK policy
id: fleet-server
namespace: default
monitoring_enabled:
- logs
- metrics
is_default_fleet_server: true
package_policies:
- name: fleet_server-1
id: fleet_server-1
package:
name: fleet_server
- name: Elastic Agent on ECK policy
id: eck-agent
namespace: default
monitoring_enabled:
- logs
- metrics
unenroll_timeout: 900
is_default: true
package_policies:
- package:
name: system
name: system-1
- package:
name: kubernetes
name: kubernetes-1
- package:
name: apm
name: apm-1
inputs:
- type: apm
enabled: true
vars:
- name: host
value: 0.0.0.0:8200
I created a new policy for outside the K8S cluster to install an agent on my Windows PC, and it also successfully sends data to Elasticsearch. I think this is a bug? I'm not sure what is causing this behavior.
@pebrc
Never mind... When I used
ssl:
certificate_authorities:
- |
-----BEGIN CERTIFICATE-----
in the xpack.fleet.outputs
instead of the xpack.fleet.outputs.ca_trusted_fingerprint
it worked. I don't know why my fingerprint didn't work though? I deleted all the colons, and used the same certificate that I used in the above snippet as the target for openssl x509 -fingerprint -sha256 -noout -in
.
Proposal
I have been searching for days looking for a solution to my issue and am convinced the current ECK helm charts do not support my use case, or is at least not documented.
I am trying to deploy a self-managed ECK stack with fleet server and agents on my k8s cluster via the ECK helm charts. The only documentation I can find regarding setting up custom TLS certs for fleet-server and agents is here which assumes the user is installing / deploying the fleet server manually through command line, not via the operator. This is my current kibana config :
I added a template to create a Traefik IngressRoute that does TLS passthrough to fleet-server-agent-http :
The IngressRoute works, the problem is the certs. Fleet-server generates its own self-signed certs when I deploy the helm chart (and values.yaml of fleet-server in the current repository show no way of customizing TLS certs). Therefore the current fleet-agent certs do not contain the fleet-server.example.com domain and therefore fails to establish TLS. I have tried to manually update the certs by updating the secret
fleet-server-agent-http-certs-internal
which is mounted as a volume at/usr/share/fleet-server/config/http-certs
, but the secret seems to be managed by the operator and is instantly regenerated to the self-signed certs when I attempt to delete it or update it in any way.The use case of making the fleet server endpoint is obvious, it should have a publicly accessible endpoint so that I can enroll agents on machines outside the cluster.
If it's currently not possible to set custom certs for fleet-server via values.yaml this issue is a feature request for that. If it's already possible, this issue is a request to update the current values.yaml to show how that can be done, and an update on documentation as well.