coreos / fleet

fleet ties together systemd and etcd into a distributed init system
Apache License 2.0
2.42k stars 302 forks source link

Prometheus Stable (0.14.0) segfault in local/storage.go #1309

Closed mwitkow closed 9 years ago

mwitkow commented 9 years ago

We're running:

prometheus, version 0.14.0 (branch: stable, revision: 67e7741)
  build user:       root
  build date:       20150603-06:20:33
  go version:       1.4.2

We got Prometheus segfaulting periodically on multiple instances. We do periodically SIGHUP to reload configs, and we do have of warnings of

time="2015-07-15T16:25:49Z" level=warning msg="Error expanding alert template OurMagicalAlert with data '{map[deployment_name:Foobar metric:Load severity:page] 13}': error parsing template __alert_OurMagicalAlert: template: __alert_OurMagicalAlert:1: function \"labels\" not defined" file=manager.go line=201

a couple of times over a window of 2-3 minutes before:

unexpected fault address 0xc2172e87b0
fatal error: fault
[signal 0xb code=0x2 addr=0xc2172e87b0 pc=0xc2172e87b0]

goroutine 81 [running]:
runtime.gothrow(0xb13be0, 0x5)
    /usr/lib/go/src/runtime/panic.go:503 +0x8e fp=0xc2172e85c0 sp=0xc2172e85a8
runtime.sigpanic()
    /usr/lib/go/src/runtime/sigpanic_unix.go:29 +0x265 fp=0xc2172e8610 sp=0xc2172e85c0
created by github.com/prometheus/prometheus/storage/local.(*memorySeriesStorage).Start
    /go/src/github.com/prometheus/prometheus/storage/local/storage.go:240 +0x502

goroutine 1 [select, 19 minutes]:

We do have a full set of error stack available off-record.

Is this a known issue?

mischief commented 9 years ago

@mwitkow-io i'm unsure how prometheus is related to fleet. wrong issue tracker?

mwitkow commented 9 years ago

Oh, apologies we were reporting during firefighting. Will repost to Prometheus.

On Wed, 15 Jul 2015 19:13 Nick Owens notifications@github.com wrote:

@mwitkow-io https://github.com/mwitkow-io i'm unsure how prometheus is related to fleet. wrong issue tracker?

— Reply to this email directly or view it on GitHub https://github.com/coreos/fleet/issues/1309#issuecomment-121699610.

mischief commented 9 years ago

@mwitkow-io good luck :-)