Open rhuddleston opened 8 years ago
Do you get this on every state change or just occasionally? Does it matter what you set the threshold to? Are you using a custom notifier? Would you mind posting your config?
Just occasionally. Here is the config:
[u'consul-alerts/config/', 0, u'null']
[u'consul-alerts/config/checks/change-threshold', 0, u'20']
[u'consul-alerts/config/checks/enabled', 0, u'true']
[u'consul-alerts/config/notif-profiles/', 0, u'null']
[u'consul-alerts/config/notif-profiles/default', 0, u'{\n "Interval": 60,\n}']
[u'consul-alerts/config/notifiers/log/enabled', 0, u'false']
[u'consul-alerts/config/notifiers/log/path', 0, u'/tmp/consul-notifications.log']
[u'consul-alerts/config/notifiers/slack/channel', 0, u'stage-cluster']
[u'consul-alerts/config/notifiers/slack/cluster-name', 0, u'stage']
[u'consul-alerts/config/notifiers/slack/detailed', 0, u'true']
[u'consul-alerts/config/notifiers/slack/enabled', 0, u'true']
[u'consul-alerts/config/notifiers/slack/icon-emoji', 0, u':rage:']
[u'consul-alerts/config/notifiers/slack/url', 0, u'https://hooks.slack.com/services/ABC123/ABC123/ABC123']
[u'consul-alerts/leader', 3304740253564472344, None]
[u'consul-alerts/notif-profiles/', 0, u'null']
I'm also getting the same panic ~once a day.
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x502520]
goroutine 53 [running]:
panic(0x99b5c0, 0xc820010090)
/usr/lib/go/src/runtime/panic.go:481 +0x3e6
github.com/AcalephStorage/consul-alerts/consul.(*ConsulAlertClient).updateHealthCheck(0xc8201ca020, 0xc820482dc0, 0x3f, 0xc820410b80)
/go/src/github.com/AcalephStorage/consul-alerts/consul/client.go:462 +0x13c0
github.com/AcalephStorage/consul-alerts/consul.(*ConsulAlertClient).UpdateCheckData(0xc8201ca020)
/go/src/github.com/AcalephStorage/consul-alerts/consul/client.go:277 +0x718
main.(*CheckProcessor).handleChecks(0xc820070ed0, 0xc82030f500, 0x24, 0x2a)
/go/src/github.com/AcalephStorage/consul-alerts/check-handler.go:96 +0x3b7
main.(*CheckProcessor).start(0xc820070ed0)
/go/src/github.com/AcalephStorage/consul-alerts/check-handler.go:28 +0xf9
created by main.startCheckProcessor
/go/src/github.com/AcalephStorage/consul-alerts/check-handler.go:152 +0x111
time="2016-09-15T14:14:41Z" level=info msg="Checking consul agent connection..."
time="2016-09-15T14:14:41Z" level=info msg="Unable to load custom config, using default instead: Unexpected response code: 500"
time="2016-09-15T14:14:41Z" level=info msg="Consul ACL Token: \"\""
time="2016-09-15T14:14:41Z" level=info msg="Consul Alerts daemon started"
time="2016-09-15T14:14:41Z" level=info msg="Consul Alerts Host: ops2"
time="2016-09-15T14:14:41Z" level=info msg="Consul Agent: 10.50.2.30:8500"
time="2016-09-15T14:14:41Z" level=info msg="Consul Datacenter: f1"
time="2016-09-15T14:14:41Z" level=info msg="Started Consul-Alerts API"
time="2016-09-15T14:14:41Z" level=info msg="Running for leader election..."
2016/09/15 14:14:41 consul.watch: Watch (type: checks) errored: Unexpected response code: 500 (rpc error: No cluster leader), retry in 5s
time="2016-09-15T14:14:41Z" level=info msg="Unable to load custom config, using default instead: Unexpected response code: 500"
time="2016-09-15T14:14:41Z" level=info msg="Now watching for events."
time="2016-09-15T14:14:46Z" level=info msg="Now watching for health changes."
time="2016-09-15T14:14:51Z" level=info msg="Running for leader election..."
This is connected with the change in the consul cluster leadership. Notice 'Unexpected response code: 500 (rpc error: No cluster leader)' in the log above, when consul-alerts restart after panic. I'm running cluster of 3 consul server which occasionally (because of the consul sensibility to network conditions) change leader. When consul cluster is changing leader application is unable to get the lock on the key in the consul kv storage.
I'm seeing the same as the above:
2016/09/23 12:47:58 [ERR] http: Request GET /v1/kv/consul-alerts/config/checks/blacklist/single/ecs-53301235/ecs-53301235.aor1.centricient.prod:ecs-dashboard-2-dashboard-90b6c3a5d4dcad991f00:41100/service:ecs-53301235.aor1.centricient.prod:ecs-dashboard-2-dashboard-90b6c3a5d4dcad991f00:41100?dc=us-west-2&token=%22%22, error: rpc error: No cluster leader from=127.0.0.1:46952 2016/09/23 12:47:58 [ERR] http: Request GET /v1/kv/consul-alerts/checks/ecs-53301235/ecs-53301235.aor1.centricient.prod:ecs-dashboard-2-dashboard-90b6c3a5d4dcad991f00:41100/service:ecs-53301235.aor1.centricient.prod:ecs-dashboard-2-dashboard-90b6c3a5d4dcad991f00:41100?dc=us-west-2&token=%22%22, error: rpc error: No cluster leader from=127.0.0.1:48054
When it gets the "No cluster leader" error consul-alerts crashes
Any update here? Getting into this state very frequently on our servers
Getting these panic on several different instances of consul alerts:
I checked check-handler.go and it has no updates since the build I'm currently on