Open ChrsMark opened 4 months ago
Thanks for checking, @ChrsMark - this looks helpful. Another bit to consider here - this won't allow you to somehow tie the process back to Kubernetes concepts, right? E.g. telling which container the process is about or something like this.
this won't allow you to somehow tie the process back to Kubernetes concepts, right? E.g. telling which container the process is about or something like this.
Maybe it's useful to note that we have the plumbing in place for this in ebpf-k8s-agent except we're not collecting/enriching every process on the target host, but only those associated with network flows.
@flash1293 I don't think this is supported today by Metricbeat. But I would potentially see it handled by https://www.elastic.co/guide/en/beats/metricbeat/current/add-kubernetes-metadata.html. But this would require some research though to check if it's doable. The idea here is that we want the process related metrics to be associated to containers+Pods. Maybe that's possible by leveraging the cgroup's information, but I'm only hard-guessing here :).
If what @christos68k suggests (or something similar) can cover the case, then that would be also great.
Thanks @christos68k and @ChrsMark - seems like a somewhat high-hanging fruit for now. I think without this capability we can't produce good suggestions, as we can't tell the user which containers to annotate and also (probably even more important) won't be able to tell whether they have been instrumented already.
+1 to this being the default, simply seeing the processes running inside the Metricbeat or Elastic Agent container is not useful at all. Almost everyone turning this metricset on will want to see the set of processes on the node.
Additionally the processes should be correlated to their relevant Kubernetes resource types. There is some additional context on the state of this in an internal issue from our cloud SRE team. That issue shows that this correlation does not work when the cluster uses the containerd runtime, which is increasingly the default. It might work when the runtime is Docker.
Thanks for this link @cmacknz - am I understanding right that there's two parts missing here to enable this:
If this is the case, I think we should go for it, as it will be a very nice feature in general and also help the auto-detection part of onboarding a lot as processes are very good signals to tell what kind of workload is running.
FYI @thomheymann @akhileshpok
This is also important for otel collector, we should do it for both.
Hello, summarising the issue:
cc @thomheymann
Do we have an issue that tracks any work for the comment https://github.com/elastic/elastic-agent/issues/5256#issuecomment-2270370733?
@gizas I don't think so, could you create that one?
@flash1293 https://github.com/elastic/beats/issues/40495 the issue for the processor enhancement. As already said the https://github.com/elastic/elastic-agent/issues/4670 is a prerequisite.
@flash1293 elastic/beats#40495 the issue for the processor enhancement. As already said the #4670 is a prerequisite.
@ChrsMark the above issue will track the work on agent side for the integrations.
For otel now we will need to track the same effort and analysis with host receiver and enrichment there (with k8s attributes ). Do we have something relevant with otel elastic agents? I think we need a new issue in opentelemtry-dev
@graphaelli FYI we have added this story in the backlog.
The the https://github.com/elastic/elastic-agent/issues/4670 is a prerequisite for the story to happen. That is why we have not prioritised it in this iteration
Mainly we will need a) to collect the host processes and b) to enhance them with k8s metadata.
So for the a) collection side, we will need on standalone agent templates to include the fixes (we have this story and https://github.com/elastic/elastic-agent/issues/5289 to track and not miss it) and on managed agent side the system integration will need to be updated (see comment) For the b) metadata enhancement, https://github.com/elastic/beats/issues/40495 is the issue to track the work
In Agent standlone on K8s the
process
datastream is enabled by default: https://github.com/elastic/elastic-agent/blob/6aa581cbec8e6f8063571048e52a3b9f0b352c80/deploy/kubernetes/elastic-agent-standalone/elastic-agent-standalone-daemonset-configmap.yaml#L492However it does not collect the underlying host's processes.
Would that make sense to collect the underlying system's processes (and possibly metrics) instead of those of the Agent container's scope?
I tried the following:
(note the
hostfs: "/hostfs"
) part. To get the desired result:After the addition of the
hostfs: "/hostfs"
setting I could see the processes of the underlying host, likekubelet
etc. We can consider if this should be the default or at least make the switch easier for the users with and/or commented out sections./cc @flash1293 @gizas
ref: https://www.elastic.co/guide/en/beats/metricbeat/current/running-on-docker.html#monitoring-host