buildkite / lifecycled

A daemon for responding to AWS AutoScaling Lifecycle Hooks
MIT License
146 stars 34 forks source link

Seg fault when executing spot termination script #62

Closed lox closed 5 years ago

lox commented 5 years ago
lifecycled: time="2018-12-11T21:57:55Z" level=info msg="Looking up instance id from metadata service"
lifecycled: time="2018-12-11T21:57:55Z" level=info msg="Starting listener" instanceId=i-xxx listener=spot
lifecycled: time="2018-12-11T21:57:55Z" level=info msg="Waiting for termination notices" instanceId=i-xxx
lifecycled: time="2018-12-11T22:00:30Z" level=info msg="Stopped listener" instanceId=i-xxx listener=spot
lifecycled: time="2018-12-11T22:00:30Z" level=info msg="Received termination notice" instanceId=i-xxx notice=spot
lifecycled: time="2018-12-11T22:00:30Z" level=info msg="Executing handler" instanceId=i-xxx notice=spot
lifecycled: panic: runtime error: invalid memory address or nil pointer dereference
lifecycled: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x878249]
lifecycled: goroutine 1 [running]:
lifecycled: os.(*File).Name(...)
lifecycled: /usr/local/Cellar/go/1.11.1/libexec/src/os/file.go:50
lifecycled: github.com/buildkite/lifecycled.(*FileHandler).Execute(0xc00000c030, 0xab5ac0, 0xc0000521c0, 0xc00030cc00, 0x3, 0x3, 0x14, 0x5c10337e)
lifecycled: /Users/lachlan/go/src/github.com/buildkite/lifecycled/daemon.go:163 +0x29
lifecycled: github.com/buildkite/lifecycled.(*spotTerminationNotice).Handle(0xc0000d7ae0, 0xab5ac0, 0xc0000521c0, 0xab09e0, 0xc00000c030, 0xc000204f00, 0x0, 0xc0001458b0)
lifecycled: /Users/lachlan/go/src/github.com/buildkite/lifecycled/spot.go:91 +0x11e
lifecycled: main.main.func1(0xc000196e10, 0x0, 0x0)
lifecycled: /Users/lachlan/go/src/github.com/buildkite/lifecycled/cmd/lifecycled/main.go:150 +0x7be
lifecycled: github.com/buildkite/lifecycled/vendor/github.com/alecthomas/kingpin.(*actionMixin).applyActions(0xc0000e86a8, 0xc000196e10, 0x0, 0x0)
lifecycled: /Users/lachlan/go/src/github.com/buildkite/lifecycled/vendor/github.com/alecthomas/kingpin/actions.go:28 +0x6d
lifecycled: github.com/buildkite/lifecycled/vendor/github.com/alecthomas/kingpin.(*Application).applyActions(0xc0000e8690, 0xc000196e10, 0x0, 0x0)
lifecycled: /Users/lachlan/go/src/github.com/buildkite/lifecycled/vendor/github.com/alecthomas/kingpin/app.go:551 +0x3f
lifecycled: github.com/buildkite/lifecycled/vendor/github.com/alecthomas/kingpin.(*Application).execute(0xc0000e8690, 0xc000196e10, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0000d66e0, 0x0)
lifecycled: /Users/lachlan/go/src/github.com/buildkite/lifecycled/vendor/github.com/alecthomas/kingpin/app.go:390 +0x8f
lifecycled: github.com/buildkite/lifecycled/vendor/github.com/alecthomas/kingpin.(*Application).Parse(0xc0000e8690, 0xc00001c1b0, 0x0, 0x0, 0x1, 0xc0000a4198, 0x0, 0x1)
lifecycled: /Users/lachlan/go/src/github.com/buildkite/lifecycled/vendor/github.com/alecthomas/kingpin/app.go:222 +0x1fa
lifecycled: main.main()
lox commented 5 years ago

This is a perplexing one. It looks like the handler object getting passed in is nil. Any ideas @itsdalmo?

itsdalmo commented 5 years ago

Assuming that this is version 3.0.0?

Based on these log entries:

lifecycled: os.(*File).Name(...)
lifecycled: /usr/local/Cellar/go/1.11.1/libexec/src/os/file.go:50

And os.(*File).Name() being defined as:

func (f *File) Name() string { return f.name }

It doesn't seem like Handler itself is nil, but rather Handler.file? Perhaps lifecycled was started without --handler being specified, and since it's not required or validated to be non-nil, this is what causes the crash? Either way, with the current implementation in main.go then --handler should be a required argument - I can submit a PR.

itsdalmo commented 5 years ago

Sidenote; I'm wondering if perhaps more of the logic that currently lives in cmd/lifecycled/main.go should be moved into the module so it can thoroughly tested (would be a good idea if we want to support more handlers as well, e.g. #54).

lox commented 5 years ago

Yup, I think it makes a lot of sense to move as much as possible of main.go into something testable.

itsdalmo commented 5 years ago

Ref this PR: https://github.com/buildkite/elastic-ci-stack-for-aws/pull/507

Seems like handler not being set might have been the issue here? Sorry about that!

lox commented 5 years ago

Yeah, I suspect that was the cause, as we weren't restarting lifecycled after setting it's config. No problems, appreciate your help fixing it!

lox commented 5 years ago

Closed via #63.