elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.63k stars 8.22k forks source link

Provide Kibana Alerting functionality for Fleet #79310

Closed jeffvestal closed 7 months ago

jeffvestal commented 4 years ago

Describe the feature: As an administrator responsible for managing Elastic Agents with Fleet, I would like to easily enable alerting to be notified when an agent goes offline. Ideally I would be able to configure this with a Kibana Alerting flyout within the Fleet UI.

Describe a specific use case for the feature: With the elastic agent being centrally managed within Kibana, it is important for end users to know when one or more of the agents goes offline. We have other indicators down the line (eg. log rate drop offs) but operators/administrators need to be able to configure alerting for offline agents regardless of the modules they are running.

Currently there is only a visual indication when you navigate to Ingest Manager -> Fleet (screenshot below). Without OOTB alerting functionality we risk missing data, affecting other Solutions, and affecting business disruptions for use cases that rely on timely delivery of data through our pipeline.

Fleet UI - Agent offline

cc: @mukeshelastic

elasticmachine commented 4 years ago

Pinging @elastic/ingest-management (Team:Ingest Management)

mostlyjason commented 4 years ago

@mukeshelastic FYI for agent observability. I think we already write agent status to ES, but could use docs and maybe out of the box alerts?

renzedj commented 1 year ago

@mukeshelastic FYI for agent observability. I think we already write agent status to ES, but could use docs and maybe out of the box alerts?

The agent status isn't an ongoing track of Agent status though. It's just the current status (e.g., I can't look at it and see that 12h ago an agent was offline for 30m), and IIRC from checking this out as an option, it doesn't write when it goes offline or when something misses a check-in.

zez3 commented 1 year ago

This would be indeed very useful

zez3 commented 1 year ago

@jamiehynds any update on this?

leandrojmp commented 1 year ago

Hello, is this still planned?

Having an OOTB alert of when an Agent is offline should be a Core Feature.

mikefrommars commented 9 months ago

When using Elastic as a Security and Compliance tool I need to know when an Agent goes offline since that means I am no longer collecting logs via the agent.

Any updates on this? Has the feature been approved?

x-dean commented 9 months ago

I would like to have also alerting capabilities when resource usage of an Agent is very high.

I had an issue not much time ago when Azure did some changes on network side, agent could not connect to event hub, but CPU usage of agent was 100%. Even after Azure restored the connectivity, agent was still on very high CPU usage and very few events could be ingested. It was needed a restart of the agent service to bring things to normal working status.

nyp-cgranata commented 8 months ago

+1 on this issue. In a similar situation as @mikefrommars.

farbod-sec commented 7 months ago

This is a bedrock / foundational feature for SIEM/security and some o11y. It needs to be turnkey and OOTB. I have customers regularly asking about how to accomplish this.

To add on to what Vestal posted earlier, SIEMs also have silent log alerting somewhere nearby agent heartbeat alerting as they go hand in hand for operators. It would be nice to have a single page to configure and monitor heartbeat + log rate.

jen-huang commented 7 months ago

cc @nimarezainia

nimarezainia commented 7 months ago

@farbod-sec please open an ER for the SIEM related enhancements you are referring to.

Regarding alerts on agents: please refer to the agent documentation: https://www.elastic.co/guide/en/fleet/current/monitor-elastic-agent.html#fleet-alerting

We are now exposing the various agent statuses required to build alerts. These were previously hidden which prohibited us building any ML or Alerts on top of them. I'm closing this request for now and if there are enhancements to be made as a follow-on would be happy to consider them.

carlosaya commented 6 months ago

@nimarezainia am i missing something, or does the link you provided only state that a COUNT of the agents in each state is provided? This is a start (I guess), but we really need to know WHICH agents are offline so that we can raise individual alerts for each agent when it goes offline.

nimarezainia commented 6 months ago

@carlosaya you are right, it will give an alert when the count changes. We currently don't have the ability to create an alert on an individual agent basis. Something we plan to address.

carlosaya commented 6 months ago

@carlosaya you are right, it will give an alert when the count changes. We currently don't have the ability to create an alert on an individual agent basis. Something we plan to address.

@nimarezainia Thanks for the confirmation. Is there an issue I can keep an eye on for that feature?

krol3 commented 2 months ago

we really need to know WHICH agents are offline so that we can raise individual alerts for each agent when it goes offline. Can you advice me

@farbod-sec please open an ER for the SIEM related enhancements you are referring to.

Regarding alerts on agents: please refer to the agent documentation: https://www.elastic.co/guide/en/fleet/current/monitor-elastic-agent.html#fleet-alerting

We are now exposing the various agent statuses required to build alerts. These were previously hidden which prohibited us building any ML or Alerts on top of them. I'm closing this request for now and if there are enhancements to be made as a follow-on would be happy to consider them.

Can you advice how to enable alerting to be notified when an agent goes offline? is this issue resolved?

nimarezainia commented 2 months ago

Can you advice how to enable alerting to be notified when an agent goes offline? is this issue resolved?

We currently don't have the ability to alert on an individual agent. You can only alert if the status changes (as in total number of offline agents changes) and the user has to investigate. Alerting on individual agents is on the roadmap but not yet prioritized.

@krol3 please open an ER and let me know.

zez3 commented 2 months ago

There should be now an elastic internal ER 22485 for this. Not sure how to format this internal #ER22485 or if it belongs in the Fleet repo

leandrojmp commented 2 months ago

Just some context on our use case.

We have thousands of agents on our cluster, servers and desktops/laptops, we use multiple policies and servers and desktop/laptops have different policies.

Desktops/laptops are expected to be offline frequently, this is one of the reasons that we can rely on the count of the offline agents since this flutuate a lot through the day and specially on weekends.

But for servers we need to keep more attention if we have offline servers or not, so on a native alerting functionality we would need to be able to filter by policies or some tags for example.

It would be nice if the alerting feature for the agents could be defined in the policy or use the policy as a filter.

Erikg346 commented 2 months ago

Just adding my two cents,

We deployed the elastic agent to 5,000 devices, clients, and servers. It's crucial to know which agents in which policy are offline.

This also applies to Kibana Alerting in general. Customers are tired of "count" or "matches" alerting. We want alerts with contextual information.