elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.35k stars 7.98k forks source link

Provide Kibana Alerting functionality for Fleet #79310

Closed jeffvestal closed 1 month ago

jeffvestal commented 3 years ago

Describe the feature: As an administrator responsible for managing Elastic Agents with Fleet, I would like to easily enable alerting to be notified when an agent goes offline. Ideally I would be able to configure this with a Kibana Alerting flyout within the Fleet UI.

Describe a specific use case for the feature: With the elastic agent being centrally managed within Kibana, it is important for end users to know when one or more of the agents goes offline. We have other indicators down the line (eg. log rate drop offs) but operators/administrators need to be able to configure alerting for offline agents regardless of the modules they are running.

Currently there is only a visual indication when you navigate to Ingest Manager -> Fleet (screenshot below). Without OOTB alerting functionality we risk missing data, affecting other Solutions, and affecting business disruptions for use cases that rely on timely delivery of data through our pipeline.

Fleet UI - Agent offline

cc: @mukeshelastic

elasticmachine commented 3 years ago

Pinging @elastic/ingest-management (Team:Ingest Management)

mostlyjason commented 3 years ago

@mukeshelastic FYI for agent observability. I think we already write agent status to ES, but could use docs and maybe out of the box alerts?

renzedj commented 11 months ago

@mukeshelastic FYI for agent observability. I think we already write agent status to ES, but could use docs and maybe out of the box alerts?

The agent status isn't an ongoing track of Agent status though. It's just the current status (e.g., I can't look at it and see that 12h ago an agent was offline for 30m), and IIRC from checking this out as an option, it doesn't write when it goes offline or when something misses a check-in.

zez3 commented 8 months ago

This would be indeed very useful

zez3 commented 6 months ago

@jamiehynds any update on this?

leandrojmp commented 6 months ago

Hello, is this still planned?

Having an OOTB alert of when an Agent is offline should be a Core Feature.

mikefrommars commented 3 months ago

When using Elastic as a Security and Compliance tool I need to know when an Agent goes offline since that means I am no longer collecting logs via the agent.

Any updates on this? Has the feature been approved?

X-Dean commented 3 months ago

I would like to have also alerting capabilities when resource usage of an Agent is very high.

I had an issue not much time ago when Azure did some changes on network side, agent could not connect to event hub, but CPU usage of agent was 100%. Even after Azure restored the connectivity, agent was still on very high CPU usage and very few events could be ingested. It was needed a restart of the agent service to bring things to normal working status.

nyp-cgranata commented 2 months ago

+1 on this issue. In a similar situation as @mikefrommars.

farbod-sec commented 1 month ago

This is a bedrock / foundational feature for SIEM/security and some o11y. It needs to be turnkey and OOTB. I have customers regularly asking about how to accomplish this.

To add on to what Vestal posted earlier, SIEMs also have silent log alerting somewhere nearby agent heartbeat alerting as they go hand in hand for operators. It would be nice to have a single page to configure and monitor heartbeat + log rate.

jen-huang commented 1 month ago

cc @nimarezainia

nimarezainia commented 1 month ago

@farbod-sec please open an ER for the SIEM related enhancements you are referring to.

Regarding alerts on agents: please refer to the agent documentation: https://www.elastic.co/guide/en/fleet/current/monitor-elastic-agent.html#fleet-alerting

We are now exposing the various agent statuses required to build alerts. These were previously hidden which prohibited us building any ML or Alerts on top of them. I'm closing this request for now and if there are enhancements to be made as a follow-on would be happy to consider them.

carlosaya commented 1 week ago

@nimarezainia am i missing something, or does the link you provided only state that a COUNT of the agents in each state is provided? This is a start (I guess), but we really need to know WHICH agents are offline so that we can raise individual alerts for each agent when it goes offline.

nimarezainia commented 1 week ago

@carlosaya you are right, it will give an alert when the count changes. We currently don't have the ability to create an alert on an individual agent basis. Something we plan to address.

carlosaya commented 1 week ago

@carlosaya you are right, it will give an alert when the count changes. We currently don't have the ability to create an alert on an individual agent basis. Something we plan to address.

@nimarezainia Thanks for the confirmation. Is there an issue I can keep an eye on for that feature?