Closed jeffvestal closed 7 months ago
Pinging @elastic/ingest-management (Team:Ingest Management)
@mukeshelastic FYI for agent observability. I think we already write agent status to ES, but could use docs and maybe out of the box alerts?
@mukeshelastic FYI for agent observability. I think we already write agent status to ES, but could use docs and maybe out of the box alerts?
The agent status isn't an ongoing track of Agent status though. It's just the current status (e.g., I can't look at it and see that 12h ago an agent was offline for 30m), and IIRC from checking this out as an option, it doesn't write when it goes offline or when something misses a check-in.
This would be indeed very useful
@jamiehynds any update on this?
Hello, is this still planned?
Having an OOTB alert of when an Agent is offline should be a Core Feature.
When using Elastic as a Security and Compliance tool I need to know when an Agent goes offline since that means I am no longer collecting logs via the agent.
Any updates on this? Has the feature been approved?
I would like to have also alerting capabilities when resource usage of an Agent is very high.
I had an issue not much time ago when Azure did some changes on network side, agent could not connect to event hub, but CPU usage of agent was 100%. Even after Azure restored the connectivity, agent was still on very high CPU usage and very few events could be ingested. It was needed a restart of the agent service to bring things to normal working status.
+1 on this issue. In a similar situation as @mikefrommars.
This is a bedrock / foundational feature for SIEM/security and some o11y. It needs to be turnkey and OOTB. I have customers regularly asking about how to accomplish this.
To add on to what Vestal posted earlier, SIEMs also have silent log alerting somewhere nearby agent heartbeat alerting as they go hand in hand for operators. It would be nice to have a single page to configure and monitor heartbeat + log rate.
cc @nimarezainia
@farbod-sec please open an ER for the SIEM related enhancements you are referring to.
Regarding alerts on agents: please refer to the agent documentation: https://www.elastic.co/guide/en/fleet/current/monitor-elastic-agent.html#fleet-alerting
We are now exposing the various agent statuses required to build alerts. These were previously hidden which prohibited us building any ML or Alerts on top of them. I'm closing this request for now and if there are enhancements to be made as a follow-on would be happy to consider them.
@nimarezainia am i missing something, or does the link you provided only state that a COUNT of the agents in each state is provided? This is a start (I guess), but we really need to know WHICH agents are offline so that we can raise individual alerts for each agent when it goes offline.
@carlosaya you are right, it will give an alert when the count changes. We currently don't have the ability to create an alert on an individual agent basis. Something we plan to address.
@carlosaya you are right, it will give an alert when the count changes. We currently don't have the ability to create an alert on an individual agent basis. Something we plan to address.
@nimarezainia Thanks for the confirmation. Is there an issue I can keep an eye on for that feature?
we really need to know WHICH agents are offline so that we can raise individual alerts for each agent when it goes offline. Can you advice me
@farbod-sec please open an ER for the SIEM related enhancements you are referring to.
Regarding alerts on agents: please refer to the agent documentation: https://www.elastic.co/guide/en/fleet/current/monitor-elastic-agent.html#fleet-alerting
We are now exposing the various agent statuses required to build alerts. These were previously hidden which prohibited us building any ML or Alerts on top of them. I'm closing this request for now and if there are enhancements to be made as a follow-on would be happy to consider them.
Can you advice how to enable alerting to be notified when an agent goes offline? is this issue resolved?
Can you advice how to enable alerting to be notified when an agent goes offline? is this issue resolved?
We currently don't have the ability to alert on an individual agent. You can only alert if the status changes (as in total number of offline agents changes) and the user has to investigate. Alerting on individual agents is on the roadmap but not yet prioritized.
@krol3 please open an ER and let me know.
There should be now an elastic internal ER 22485 for this. Not sure how to format this internal #ER22485 or if it belongs in the Fleet repo
Just some context on our use case.
We have thousands of agents on our cluster, servers and desktops/laptops, we use multiple policies and servers and desktop/laptops have different policies.
Desktops/laptops are expected to be offline frequently, this is one of the reasons that we can rely on the count of the offline agents since this flutuate a lot through the day and specially on weekends.
But for servers we need to keep more attention if we have offline servers or not, so on a native alerting functionality we would need to be able to filter by policies or some tags for example.
It would be nice if the alerting feature for the agents could be defined in the policy or use the policy as a filter.
Just adding my two cents,
We deployed the elastic agent to 5,000 devices, clients, and servers. It's crucial to know which agents in which policy are offline.
This also applies to Kibana Alerting in general. Customers are tired of "count" or "matches" alerting. We want alerts with contextual information.
Describe the feature: As an administrator responsible for managing Elastic Agents with Fleet, I would like to easily enable alerting to be notified when an agent goes offline. Ideally I would be able to configure this with a Kibana Alerting flyout within the Fleet UI.
Describe a specific use case for the feature: With the elastic agent being centrally managed within Kibana, it is important for end users to know when one or more of the agents goes offline. We have other indicators down the line (eg. log rate drop offs) but operators/administrators need to be able to configure alerting for offline agents regardless of the modules they are running.
Currently there is only a visual indication when you navigate to Ingest Manager -> Fleet (screenshot below). Without OOTB alerting functionality we risk missing data, affecting other Solutions, and affecting business disruptions for use cases that rely on timely delivery of data through our pipeline.
cc: @mukeshelastic