[EPIC] Support for Anomaly Detection

vanakema commented 2 years ago

Is your feature request related to a problem?

When you have a small team, you want to know when you're app is misbehaving, with a little intervention as possible

Describe the solution you'd like

SigNoz integrates an open source anomaly detection library, to alert users if anything gets out "normal" range

Some usecase:

Abnormal latency (latency spiking) on certain DB queries
Abnormal latency (latency spiking) on certain flask endpoints
Abnormal error rate on certain endpoints
Abnormal requests/s

Describe alternatives you've considered

Really the only alternative would be manually creating alerts in Promethease or feeding SigNoz metrics into an anomaly detection library ourselves

Additional context

The DataDog WatchDog feature is great because of the automatic detection of anomalous behavior, and is really helpful when you have a small team, or a team without a dedicated SRE person, since you no longer have to know what to look for necessarily.

Thank you for your feature request – we love each and every one!

vanakema commented 2 years ago

Figured this might be a helpful repo for reference https://github.com/rob-med/awesome-TS-anomaly-detection

pranay01 commented 2 years ago

Thanks @vanakema for detailing out the use cases. Anomaly detection IS in our roadmap - but a few months down the line.

Curious, what sort of algos worked best for you for detecting "abnormal" values? Does a simple threshold rolling average works good enough or more advanced algos like seasonal pattern detection etc. are needed

ankitnayan commented 2 years ago

Gitlab has written about basic anomaly detection using Prometheus rules using z-score and seasonality. https://about.gitlab.com/blog/2019/07/23/anomaly-detection-using-prometheus/

Such sort of things would be possible with SigNoz also as we plan SigNoz to be compatible with Prometheus rules and alertmanager.

pranay01 commented 1 year ago

We can also leverage Third Eye

This is built for Apache Pinot which an OLAP database similar to ClickHouse

nwmcsween commented 11 months ago

Might be worth while asking the netdata team on lessons learnt applying ML to time series.

pranay01 commented 11 months ago

Thanks for the note @nwmcsween Do you think Netdata does a good job applying ML to time series data? Any blogs/issues where they share more about it?

StefanSa commented 11 months ago

@pranay01 Namaste Especially ML and alarms is the specialty of netdata. It's worth it to have a look at it. I speak from 30 years of experience with Nagios, Zabbix, Elastic, Opensearch, Influx, and many more including Netdata. Netdata is top-heavy more on *nix than on Windows and lacks otel integration. That's why I'm looking at you guys right now. 😃

pranay01 commented 11 months ago

Thanks @StefanSa - do you have relevant docs in NetData I should look at?

StefanSa commented 11 months ago

@pranay01 Certainly not a problem. There is a lot of reading material here, as said alerting is also well done there.

ML: https://learn.netdata.cloud/docs/ml-and-troubleshooting/machine-learning-ml-powered-anomaly-detection

https://learn.netdata.cloud/docs/ml-and-troubleshooting/anomaly-advisor

https://learn.netdata.cloud/docs/visualizations/netdata-charts#anomaly-rate-ribbon

https://learn.netdata.cloud/docs/ml-and-troubleshooting/metric-correlations

https://www.youtube.com/watch?v=2gJ36YuW6Ko

Alerting: https://learn.netdata.cloud/docs/alerting/

Live Demo: Live-Demo

SigNoz / signoz