bosun-monitor / bosun

Time Series Alerting Framework
http://bosun.org
MIT License
3.4k stars 495 forks source link

Project status and call for new maintainer(s) #2517

Open neilfordyce opened 3 months ago

neilfordyce commented 3 months ago

Skyscanner is in the process of migrating from an open source monitoring solution to a SaaS vendor: we want to focus on our core business of building better travel comparison products to delight travellers. As we won't be using Bosun after that vendor migration, it no longer makes sense for us to be Bosun maintainers. However, we know many of you are Bosun fans, so we're looking for Bosun users or contributors who would be interested in becoming maintainers instead. If that’s you, please reach out on this issue by 1st December 2024. Regrettably, if we cannot find a new maintainer, we will need to archive Bosun.

Bosun has heaps of potential, and we’ve pulled together a list of areas you might be interested in working on, should the idea of being a Bosun maintainer appeal.

Plugin architecture

Refactor Bosun so that its core contains only the common functionality that everyone needs. Maintaining code for integrations you don't use is challenging. Integration with other notification systems or infrequently used time series databases can be introduced through plugins delivered from other repositories.

Integration with incident management systems

Bosun should make it easier to integrate with an IMS such as PagerDuty, Splunk On-call or Incident.io The current method of integration with generic HTTP request template leads to lots of repeated error prone config.

Retry failed notifications

Bosun allows creating and making arbitrary HTTP requests through notifications. These requests can fail and in the current implementation, Bosun doesn't try sending the request again, which causes missed notifications.

Alert notifications for repeated alert failures

If an alert causes an error (like a 404 in OpenTSDB) or a notification fails, it is logged. However, it's easy to miss; particularly if you don't use the Bosun UI often and use Bosun for sending notifications to other incident management systems. Adding a default fallback notification mechanism would prevent such failures going unnoticed.

Support SQL backend

Instead of using Redis for storing state, use a relational database to allow reporting. Future enhancements to do things like collect "was this alert useful" information would be easier to implement and report on. Adding TTLs to reduce storage requirements without breaking foreign keys becomes easier.