bosun-monitor / bosun

Time Series Alerting Framework
http://bosun.org
MIT License
3.4k stars 495 forks source link

Only process some metrics when OpenTSDB is enabled #2503

Closed harudark closed 1 year ago

harudark commented 2 years ago

Description

When OpenTSDB is not enabled, the processing of metrics sending to OpenTSDB is in vain.

The underlying reason to make this change is to make the scheduler run more accurately.

In production, it takes about 100 - 300ms to process these metrics. Suppose the time to process metric is always 200ms and one alert is scheduled to run every minute, the actual number of alert execution for one day becomes 60 60 24 / 60.2 = 1435.2, less than expected 1440. Whether the reduced 5 times execution matters or not depends on use cases and people may have different opinions.

The real problem we have is one important minutely SLO metric bosun_uptime relying on the accuracy of the scheduler. In current situation, because of this extra processing time, every few minutes, the minutely alert starting time is delayed 1s, which causes the metric missing problem.

Ideally, we may introduce jitter to reduce the impact of metrics processing time or optimze the processing time, but both are tricky to implement. This change is not very elegant but straightforward.

Type of change

How has this been tested?

Test in production

Checklist:

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.