elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.68k stars 8.23k forks source link

[APM] Alerting use cases and examples #103785

Open sorenlouv opened 3 years ago

sorenlouv commented 3 years ago

1. Alerting on Garbage collection

Ability to create alerts for garbage collection metrics. Source

Another similar request:

I would like to add that it would also also be useful to be able to create latency threshold & error count alerts at the individual transaction granularity.

For example, we have a requirement that each request must process server side less than a 0.5sec Very often in an application there can be 50+ different API endpoints (transactions) And in case only 1 is slow (avg 1sec), the remaining 49 successes (avg 0.2sec) will show the average success.

Also requested in https://github.com/elastic/kibana/issues/86108, https://github.com/elastic/kibana/issues/134481

2. Ability to create rules for multiple services (but not all) at a time

It would be nice if we could create and manage alerts for multiple services (but not every service).

https://github.com/elastic/kibana/issues/104886

3. Alerts on dependencies (#16724, https://github.com/elastic/kibana/issues/166309)

The customer wants to create an APM alert based on a dependency's latency (like Redis or Elasticsearch itself) instead of the entire service's latency.

4. Alerts for for throughput and failure rate anomalies (https://github.com/elastic/kibana/issues/159288)

Customer thinks the request to have Alerts on an ML job looking for increased error rates makes perfect sense. They say it should become part of an out-of-the-box experience.

https://github.com/elastic/enhancements/issues/12409

5. Add KQL filtering to APM rules

It should be possible to add custom filtering using kql. This has been request by users and will align APM rules with other Observability rules.

botelastic[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

acrewdson commented 2 years ago

+1 for

  1. Alert for specific transaction name

I think this could be a valuable feature for many teams using APM. As noted above, in a typical API, response times for certain endpoints can be more critical than others, and it would be nice for latency-based alerts in APM to be capable of representing this. Being able to alert on latency in a more granular way, by targeting specific transactions, would be really helpful.

sorenlouv commented 2 years ago

Thanks for chiming in @acrewdson! I'll make sure to include your feedback and will try to get this on the roadmap.

bradleydamato commented 2 years ago

Hey team, has this item been roadmapped? If not, are there any workarounds to enable this functionality? Specifically, I'm looking to create latency threshold alerts for specific transactions (rather than at the service level).

chrisdistasio commented 1 year ago

Alert on Dependency metrics

e-parth-pathak commented 1 year ago

@sqren Adding the use case of a customer by anonymising the customer data. I have replaced the names.

USE CASE:

  1. Customer should be able to configure an Anomaly Detection alert on a single transaction out of an APM service. For example in customer's environment, we have a Java APM agent who collecting metrics and is writing the data into the service some-java-service-name.
  2. In this service we have a transaction: sometransaction#name
  3. They can configure an alert an ML Anomaly job on the entire service.

OBSERVATIONS:

  1. Right now, Machine learning jobs are taking into consideration data points from the apm-* data view in kibana. In this, the data is being fetched on the basis of entire collection of apm-* indices.
  2. We need to have more fine tuned and granular filters, so that, from a certain service, only certain transactions can be monitored for detection of Anomaly.
  3. We tried creating a custom data view selecting only apm-7.13.2-transaction-* indices but, this is not fetching the data based on transactions.
chrisdistasio commented 1 year ago

For discussion, from Slack thread:

Anna Maria Modée Hi team! Got a question today which highlights how we’ve been working in silos (that was discussed in the call previously): In the Security roadmap, we are planning an alert (probably EQL based) to identify missing events. This can also be useful in an observability use case and it is actually being asked by NNIT. https://github.com/elastic/security-team/issues/2835 Wouldn’t we also be able to use this in Observability?

sorenlouv commented 1 year ago

Removing the following items from the list:

These were implemented in https://github.com/elastic/kibana/pull/154241, https://github.com/elastic/kibana/pull/155405 and https://github.com/elastic/kibana/pull/155410 and shipped in 8.8 🎉

hp0620 commented 1 year ago

Hello, @sqren,

Do we have anything on the roadmap for:

  1. Alerts on dependencies (https://github.com/elastic/enhancements/issues/16724) The customer wants to create an APM alert based on a dependency's latency (like Redis or Elasticsearch itself) instead of the entire service's latency.

I have another customer asking for the availability on this feature and was wondering if there're any updates we can share.

Thank you.

sorenlouv commented 1 year ago

@hp0620 I've created a dedicated issue to track this: https://github.com/elastic/kibana/issues/166309. Do you have any more details around the use case that I can add?

hp0620 commented 1 year ago

Thanks @sqren for creating a separate issue to track.

Customer shared the use case with us:

One of the application development teams would like to track/monitor the latency on an mssql dependency for their service rather than monitoring latency at the service or transaction level. This would provide value by allowing them to track poor performing queries.

Hope this helps you better understand the use case behind the request. Let me know if you need anything else.