elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.12k stars 4.91k forks source link

[Meta] Enhance input Health reporting from agent to better convey issues related to installation of `unprivileged` agent #39604

Closed nimarezainia closed 1 week ago

nimarezainia commented 3 months ago

If the agent is provisioned in the unprivileged mode there may be data sources which won't be readable by the agent, as they require higher privilege to be accessed. This will cause the agent to go into a degraded state and show the integration as unhealthy.

Since the agent knows that it is running in an unprivileged mode AND can recognize that there's an issue with reading the input, it would be great to have this information propagated back to Fleet. Ideally the user has enough information to know that their input is unhealthy due to the fact that agent is in unprivileged mode.

Filebeat health reporting implementation: https://github.com/elastic/beats/pull/39209

- [ ] https://github.com/elastic/beats/issues/39733
- [ ] https://github.com/elastic/beats/issues/39734
- [ ] https://github.com/elastic/beats/issues/39735
- [ ] https://github.com/elastic/beats/issues/39736
- [ ] https://github.com/elastic/beats/issues/39737
- [ ] https://github.com/elastic/elastic-agent/issues/4683
blakerouse commented 3 months ago

This should probably be filed more as a meta issue, with a list of beats or inputs that have actually implemented proper health reporting back to the Elastic Agent. The Elastic Agent itself already has all the mechanisms for this to be a great experience.

  1. Runtime protections that allow an input to define that it cannot be ran unless it is root or even non-root. This prevents the input from running and the reason why is reported and propagated back to Fleet Server.
  2. Health reporting of an individual input back to Elastic Agent that is then propagated back to Fleet Server. The issue her is that most inputs do not do that at all.

In quick summation, adding health reporting at the input level will provide this information.

cmacknz commented 3 months ago

Yes this is more of a Beats/input issue. We may want agent to explicitly tell inputs when agent is running as unprivileged so that they do not have to duplicate the detection logic.

ycombinator commented 3 months ago

Transferring to Beats repo per discussion in the issue.

elasticmachine commented 3 months ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

ycombinator commented 3 months ago

We may want agent to explicitly tell inputs when agent is running as unprivileged so that they do not have to duplicate the detection logic.

@blakerouse @cmacknz Is this being done already? Or do we need a separate issue to track this as an enhancement that this issue here would then depend on?

cmacknz commented 3 months ago

I had tacked it on to https://github.com/elastic/elastic-agent/issues/4683 which is needed to support the user agent changes we want as well.

cmacknz commented 3 months ago

I have updated this to be a meta issue and added a task list to update the inputs that our team owns or are part of the system integration.

CC @pierrehilbert as all of these are work for the data plane team.

nimarezainia commented 3 months ago

Ideally we would have this done in sp30 and sp31 so that we have the desired Fleet user exp, especially on the System Integration. If system was not installed by default I would say we could delay these for the follow on release. But as it stands All users, installing in `unprivilege mode will hit this issue.

@pierrehilbert is it possible to get https://github.com/elastic/beats/issues/39736 and https://github.com/elastic/beats/issues/39737 completed in sp30/sp31 so we keep our Q2 deliverable?

jlind23 commented 1 week ago

@pierrehilbert @nimarezainia all subtasks are done, shall we consider this issue as completer?

pierrehilbert commented 1 week ago

I'm in favor of closing it: remaining work is not related to Unprivileged as it is to fix issues we are now reporting and causing some inputs to be degraded.