Add happy path for shutting down the daemon

itsdalmo commented 6 years ago

This PR (work in progress) implements the changes proposed in #42.

Things remaining to be done/solved:

[x] pollSpotTermination continues to run after pollTerminationNotice has exited and the handler has started running (and vice versa). To solve this we'll need to supply a "sub" context that can be cancelled when either of the polling functions return, which then triggers the other to exit. In addition, we'll need to use buffered channels so that the reader (daemon.Start()) can stop reading and start executing the handler, without deadlocking the writer. I don't think this issue will cause any problems, but just for "correctness" it would be nice to fix it.
[x] I think we need to decide on a logging strategy and perhaps implement it as part of the refactoring, or at least make a separate issue for it. Some things to consider:
- log.Fatal(): In my opinion, fatal should only be used inside main.go when we want a non-zero exit code, or because continuing execution would result in a panic. Ideally, we wouldn't use Fatal at all, but in this case I think it's worth doing since we then get a log entry with the error in CloudWatch.
- log.Error() and log.Warn(): Error should be reserved for err's that cannot be recovered from. E.g., we don't get a second try if we fail to delete the queue when shutting down. Whereas if we fail to query the ec2 metadata, we get to try again in 5 seconds.
- log.Info() and log.Debug(): I'd like it if Info just gave enough info to verify that the control-flow has executed as expected. Whereas Debug would output more chatty stuff, like a log entry everytime it hits one of the endpoints.
- I think we should always use WithError(err).
- I think that instance ID should be attached to every log entry, to make it easier to filter/read multiple log streams at once (and still understand what happened to a particular instance).
- I think most other pieces of information can just be logged once. E.g. if I'm wondering which SQS MessageID started the handler, I can just scroll up a few lines and see the log entry from when the message was received.
- Related to the above, we should consider creating a logger in main.go, and passing a logrus.Entry (which is the output of log.WithField()) to the functions instead of using a global logger. This way we can e.g. logrus.SetField("instanceId", instanceID) in one place, and have it be a fixed field for all log entries.
[x] Implement the Monitor structs from #44 on top of the changes in this PR. Could either be done in the branch, on top of the branch, or after merging.

Have a look and let me know what you think @lox 👐

itsdalmo commented 6 years ago

If we managed to merge together the best parts of this PR and your #44, I think it would be best to implement any logging changes on top of that. Also, I think your monitor structs would make it easier to follow the code in daemon.Start() of this branch.

itsdalmo commented 6 years ago

@lox - Still a work in progress, but I've added a Listener (much like your Monitors) and Handler interface, as well as Notice interface which gives a lot of flexibility but also makes the flow of the program a lot easier to reason about (in my opinion).

Some outstanding work:

Standardise the logging (when and where, what level and what fields etc).
Better named flag for spot listener (currently just --spot).
~Use ec2metadata to get Instance ID as well.~
~Move main.go to cmd/main.go so that we can the rest of the code can live in a lifecycled library?~ (Won't be done)
~Don't export (lowercase) structs and methods that are not meant to be a part of a 'public' interface.~
~Perhaps add a init() function to main.go to set up logging and parse CLI flags.~ (Won't be done)
~autoscaling.AutoScaling should be masked and mocked like SNSClient etc.~

And in the end, we should be able to add some basic unit tests and integration tests. Unit tests will hopefully be easier to add since we are using interfaces, so we can basically fully mock listeners, notices and handlers.

Take a look and let me know what you think!

lox commented 6 years ago

This is looking really good. I'm going to close #44 and we will focus on this one. Some comments:

I'd probably favour keeping it simple and not moving it in to a sub-package. I think even with the current flat package it's reasonable to use lowercase and uppercase to control what is effectively the public api.
I'd also avoid the init for setting up the logging and parsing. IMO that is what main() is for. If anything, we should move some code out of main() and into Daemon.

Aside from that, I'll make some comments as you go!

lox commented 6 years ago

Also made you a contributor, thanks for all your contributions so far 🙇🏼‍♂️

itsdalmo commented 6 years ago

Tested the latest commit on this PR and everything seems to work fine/as expected. Going to clean up log messages now and then ask that you review, and then I think we should make issues of the "outstanding work" we can think of, that are not critical to the refactoring but should be done at some point.

PS: I think it would be nice to use the "squash and merge" button in github to merge this, so that we get 1 commit in master branch, but we can still look at the PR to see all the commits that were squashed?

itsdalmo commented 6 years ago

@lox - I've removed the WIP tag and think this is pretty much ready to merge if you are happy with it also (after you have had a chance to review and test it). Have tested it myself and everything seems to be working as intended; I have not tested actual spot terminations however.

As mentioned above, I think it should be merged with a "squash and merge", but if you rather want me to squash the commits in the PR itself, just let me know (in that case we should probably rename the branch as well).

edit: I just stuck with --spot as the flag for the spot listener. Let me know if you want it changed.

lox commented 6 years ago

I'll do a proper review of this next week and we can look at merging it for a 3.0.0 release?

itsdalmo commented 6 years ago

Sounds good @lox! I've started a new branch (based on this one) to add some testing facilities and basic tests: https://github.com/itsdalmo/lifecycled/tree/add-basic-tests. Thinking that should perhaps also be a part of a 3.0.0 release (can always rewrite the tests later etc, but it's nice to have something)?

lox commented 6 years ago

Merge at your leisure @itsdalmo and then we can look at the tests.

itsdalmo commented 6 years ago

@lox - I'll merge this now, but I think we should wait with 3.0 until we have some tests in place (which might require a little more refactoring to get right).

buildkite / lifecycled

Add happy path for shutting down the daemon #45