Closed itsdalmo closed 6 years ago
If we managed to merge together the best parts of this PR and your #44, I think it would be best to implement any logging changes on top of that. Also, I think your monitor structs would make it easier to follow the code in daemon.Start()
of this branch.
@lox - Still a work in progress, but I've added a Listener
(much like your Monitors) and Handler
interface, as well as Notice
interface which gives a lot of flexibility but also makes the flow of the program a lot easier to reason about (in my opinion).
Some outstanding work:
--spot
).ec2metadata
to get Instance ID as well.~main.go
to cmd/main.go
so that we can the rest of the code can live in a lifecycled
library?~ (Won't be done)init()
function to main.go
to set up logging and parse CLI flags.~ (Won't be done)autoscaling.AutoScaling
should be masked and mocked like SNSClient
etc.~And in the end, we should be able to add some basic unit tests and integration tests. Unit tests will hopefully be easier to add since we are using interfaces, so we can basically fully mock listeners, notices and handlers.
Take a look and let me know what you think!
This is looking really good. I'm going to close #44 and we will focus on this one. Some comments:
main()
is for. If anything, we should move some code out of main() and into Daemon
. Aside from that, I'll make some comments as you go!
Also made you a contributor, thanks for all your contributions so far 🙇🏼♂️
Tested the latest commit on this PR and everything seems to work fine/as expected. Going to clean up log messages now and then ask that you review, and then I think we should make issues of the "outstanding work" we can think of, that are not critical to the refactoring but should be done at some point.
PS: I think it would be nice to use the "squash and merge" button in github to merge this, so that we get 1 commit in master branch, but we can still look at the PR to see all the commits that were squashed?
@lox - I've removed the WIP tag and think this is pretty much ready to merge if you are happy with it also (after you have had a chance to review and test it). Have tested it myself and everything seems to be working as intended; I have not tested actual spot terminations however.
As mentioned above, I think it should be merged with a "squash and merge", but if you rather want me to squash the commits in the PR itself, just let me know (in that case we should probably rename the branch as well).
edit: I just stuck with --spot
as the flag for the spot listener. Let me know if you want it changed.
I'll do a proper review of this next week and we can look at merging it for a 3.0.0 release?
Sounds good @lox! I've started a new branch (based on this one) to add some testing facilities and basic tests: https://github.com/itsdalmo/lifecycled/tree/add-basic-tests. Thinking that should perhaps also be a part of a 3.0.0 release (can always rewrite the tests later etc, but it's nice to have something)?
Merge at your leisure @itsdalmo and then we can look at the tests.
@lox - I'll merge this now, but I think we should wait with 3.0 until we have some tests in place (which might require a little more refactoring to get right).
This PR (work in progress) implements the changes proposed in #42.
Things remaining to be done/solved:
pollSpotTermination
continues to run afterpollTerminationNotice
has exited and the handler has started running (and vice versa). To solve this we'll need to supply a "sub" context that can be cancelled when either of the polling functions return, which then triggers the other to exit. In addition, we'll need to use buffered channels so that the reader (daemon.Start()
) can stop reading and start executing the handler, without deadlocking the writer. I don't think this issue will cause any problems, but just for "correctness" it would be nice to fix it.log.Fatal()
: In my opinion, fatal should only be used insidemain.go
when we want a non-zero exit code, or because continuing execution would result in a panic. Ideally, we wouldn't use Fatal at all, but in this case I think it's worth doing since we then get a log entry with the error in CloudWatch.log.Error()
andlog.Warn()
: Error should be reserved forerr
's that cannot be recovered from. E.g., we don't get a second try if we fail to delete the queue when shutting down. Whereas if we fail to query the ec2 metadata, we get to try again in 5 seconds.log.Info()
andlog.Debug()
: I'd like it if Info just gave enough info to verify that the control-flow has executed as expected. Whereas Debug would output more chatty stuff, like a log entry everytime it hits one of the endpoints.WithError(err)
.main.go
, and passing alogrus.Entry
(which is the output oflog.WithField()
) to the functions instead of using a global logger. This way we can e.g.logrus.SetField("instanceId", instanceID)
in one place, and have it be a fixed field for all log entries.Monitor
structs from #44 on top of the changes in this PR. Could either be done in the branch, on top of the branch, or after merging.Have a look and let me know what you think @lox 👐