Open leehinman opened 11 months ago
Pinging @elastic/elastic-agent (Team:Elastic-Agent)
Adding my initial thoughts here: For sidecar containers the initial doc of reference is https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/
Currently we deploy agents as a daemonsets and perform leader election in order only one agent to collect cluster wide metrics.
So with sidecar installation we should introduce the
spec:
containers:
- name: app1
image: test:v1
command: ['sh', '-c', 'echo test']
volumeMounts:
- name: data
mountPath: /opt
- name: elastic-agent-standalone
docker.elastic.co/beats/elastic-agent:8.12.0
args: ["-c", "/etc/elastic-agent/agent.yml", "-e"]
env:
...
volumeMounts:
....
(We should include all the info from manifest in the elastic-agent container part)
There are some initial considerations:
The CPU, memory, device, and topology manager are unaware of the sidecar container lifetime and additional resource usage, and will operate as if the Pod had lower resource requests than it actually does.
This means that probably the initial resource (pod, deployment etc that will define the sidecar container) should reassure enough resources for elastic-agent. This is one of the main points referred to our discussion in slack. This needs thorough testingI think the original request for this was simply to use it to monitor Elasticsearch and not the entire k8s cluster, which narrows the scope considerably.
There was also an error encountered which might be the only blocker to using this, we would need more information to reproduce this like the configuration that was originally used.
there is an exception that the /elastic-agent/state is read only, we tried to run elastic container and the agent container with other permissions but the elastic is getting a fatal exception and get locks on logs.
If I remember correctly, the state is a hostPath
volume in the DaemonSet configuration. If we want the agent to run as a sidecar we can just use an ephemeral volume with the same lifetime as the pod like emptyDir
and it should work without code changes to the agent
an ephemeral volume with the same lifetime as the pod like
This means the agent.id and state like the filebeat registry get deleted when the pod is recreated, so it appears as a new agent in Fleet and rereads log files from the beginning.
It would get past this error though, but the state path is persisted outside the pod for a reason. If we for whatever reason we can't access the node file system it could be put onto a PVC but I don't think that simplifies the deployment at all.
This is where the ability for Elastic Agent to use Kubernetes natively to store state comes into affect, and the ideas like KV store through the control protocol for state would solve these use-cases.
Describe the enhancement:
Document how to run elastic-agent as a sidecar in ECK
Describe a specific use case for the enhancement or feature:
What is the definition of done?
External facing documentation exists that can be given to customers