[doc] document how to run elastic-agent as a sidecar in ECK

leehinman commented 11 months ago

Describe the enhancement:

Document how to run elastic-agent as a sidecar in ECK

Describe a specific use case for the enhancement or feature:

air gapped environments

What is the definition of done?

External facing documentation exists that can be given to customers

elasticmachine commented 11 months ago

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

gizas commented 11 months ago

Adding my initial thoughts here: For sidecar containers the initial doc of reference is https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/

Currently we deploy agents as a daemonsets and perform leader election in order only one agent to collect cluster wide metrics.

So with sidecar installation we should introduce the

spec:
      containers:
      - name: app1
        image: test:v1
        command: ['sh', '-c', 'echo test']
        volumeMounts:
          - name: data
            mountPath: /opt
      - name: elastic-agent-standalone
        docker.elastic.co/beats/elastic-agent:8.12.0
          args: ["-c", "/etc/elastic-agent/agent.yml", "-e"]
          env:
           ...
        volumeMounts:
         ....

(We should include all the info from manifest in the elastic-agent container part)

There are some initial considerations:

How we should define the initial deployment/ pod that we are going to inject the elastic-agent container part
If Cluster-wide metrics are going to be supported with sidecar containers? Maybe a solution here is to inject specific elastic-agent for this seperate to rest of metrics. So are we going to have two groups of side-containers one for logs and metrics for pods and one for cluster wide metrics?
Also there are performance considerations for using elastic-agent as a sidecar. According to doc The CPU, memory, device, and topology manager are unaware of the sidecar container lifetime and additional resource usage, and will operate as if the Pod had lower resource requests than it actually does. This means that probably the initial resource (pod, deployment etc that will define the sidecar container) should reassure enough resources for elastic-agent. This is one of the main points referred to our discussion in slack. This needs thorough testing
The timing for when the sidecar container starts, comparing to other containers in same pod, can not assured. This means that if we see scenarios that elastic-agent should have eg. some specific mount points ready in advance we should consider initcontainers or some delay times to start

cmacknz commented 11 months ago

I think the original request for this was simply to use it to monitor Elasticsearch and not the entire k8s cluster, which narrows the scope considerably.

There was also an error encountered which might be the only blocker to using this, we would need more information to reproduce this like the configuration that was originally used.

there is an exception that the /elastic-agent/state is read only, we tried to run elastic container and the agent container with other permissions but the elastic is getting a fatal exception and get locks on logs.

pchila commented 11 months ago

If I remember correctly, the state is a hostPath volume in the DaemonSet configuration. If we want the agent to run as a sidecar we can just use an ephemeral volume with the same lifetime as the pod like emptyDir and it should work without code changes to the agent

cmacknz commented 11 months ago

an ephemeral volume with the same lifetime as the pod like

This means the agent.id and state like the filebeat registry get deleted when the pod is recreated, so it appears as a new agent in Fleet and rereads log files from the beginning.

It would get past this error though, but the state path is persisted outside the pod for a reason. If we for whatever reason we can't access the node file system it could be put onto a PVC but I don't think that simplifies the deployment at all.

blakerouse commented 11 months ago

This is where the ability for Elastic Agent to use Kubernetes natively to store state comes into affect, and the ideas like KV store through the control protocol for state would solve these use-cases.

elastic / elastic-agent

[doc] document how to run elastic-agent as a sidecar in ECK #3775