bosun-monitor / bosun

Time Series Alerting Framework
http://bosun.org
MIT License
3.4k stars 495 forks source link

Bug: Duplication of chained notifications after api call for reload config #2444

Closed svagner closed 4 years ago

svagner commented 4 years ago

Expected behavior

API POST /api/reload -d '{ "Reload": true }' should reread config and make full restart of scheduler

Current behaviour

When we run bosun first time we have 1 dispatchNotifications routine:

goroutine 72 [select]:
bosun.org/cmd/bosun/sched.(*Schedule).dispatchNotifications(0x22d9500)
    bosun.org/cmd/bosun/sched/notify.go:26 +0x140
created by bosun.org/cmd/bosun/sched.(*Schedule).Run
    bosun.org/cmd/bosun/sched/alertRunner.go:18 +0xa4

After reload through api POST /api/reload -d '{ "Reload": true }':

goroutine 72 [select]:
bosun.org/cmd/bosun/sched.(*Schedule).dispatchNotifications(0x22d9500)
    bosun.org/cmd/bosun/sched/notify.go:26 +0x140
created by bosun.org/cmd/bosun/sched.(*Schedule).Run
    bosun.org/cmd/bosun/sched/alertRunner.go:18 +0xa4
...
goroutine 177 [select]:
bosun.org/cmd/bosun/sched.(*Schedule).dispatchNotifications(0xc00032c000)
    bosun.org/cmd/bosun/sched/notify.go:26 +0x140
created by bosun.org/cmd/bosun/sched.(*Schedule).Run
    bosun.org/cmd/bosun/sched/alertRunner.go:18 +0xa4

As result, we have duplicate chained notifications

Steps to reproduce

Please provide detailed steps for reproducing the issue.

  1. Run bosun daemon with system config:

    ...
    HTTPListen = ":8071"
    EnableReload = true
    ...

    And rules definition:

    
    alert test {
    template = default_template
    $flapper = ((epoch() % 120 ) >= 60)
    crit = $flapper
    warn = !$flapper
    runEvery = 1
    
    critNotification = default
    warnNotification = default 
    }

template default_template { subject = Test body = '' postBody = { "body": "{{.Subject}}" } }

notification default_notification { post = http://127.0.0.1:8000/ bodyTemplate = body next = default_notification timeout = 1m }

2. Run notification service:

!/usr/bin/env python3

from http.server import HTTPServer, BaseHTTPRequestHandler

class S(BaseHTTPRequestHandler): def _set_headers(self): self.send_response(200) self.end_headers() def do_POST(self):

Doesn't do anything with posted data

    self._set_headers()
    self.wfile.write("ok")

def run(server_class=HTTPServer, handler_class=S, addr="localhost", port=8000): server_address = (addr, port) httpd = server_class(server_address, handler_class) print(f"Starting httpd server on {addr}:{port}") httpd.serve_forever()

if name == "main": run()


3. Wait for first notification. After that reload config - `curl -XPOST 127.0.0.1:8071/api/reload -d '{ "Reload": true }'`
4. Check duplicate call in notification service output

## Context

Running commit: 81e993e07efb7cf610b9101bca90db7244b7dff2
OS: linux_x64, Linux 5.3.14-300.fc31.x86_64

## Logs

Nothing interesting in logs...