CiscoCloud / marathon-consul

bridge Marathon information to Consul KV
Apache License 2.0
85 stars 18 forks source link

Add event endpoint support #15

Closed tdooner closed 8 years ago

tdooner commented 8 years ago

Hello! Thanks for creating a nice simple tool which glues together marathon and consul. We've been running it for a bit already and it looks promising for the future.

This branch adds support to autodetect if the version of Marathon is >= 0.9.0 (the version which introduces the /v2/events endpoint) and in that case, use the /v2/events endpoint rather than starting an HTTP server to receive the eventSubscription callbacks.

I'm a golang newcomer, so let me know if I've done anything super weird. Thanks.

stevendborrelli commented 8 years ago

@tdooner thanks for this pull request! The main author is out of the country today, so the review will take a couple of days.

BrianHicks commented 8 years ago

@tdooner thanks for the PR! This is a feature we've been wanting to add for a while now. It looks good, you haven't done anything completely weird for Go. I've left some comments on lines for minor improvements.

One bigger thing I noticed is that this does not handle errors in the connection to the event stream. Meaning, if Marathon goes away or the connection dies, this will not restart the connection (it probably should, with a connection timeout). If you've added that and I just missed it, please let me know.

Once we've got these all taken care of, I'm looking forward to shipping this. Thanks again for the time you took to make this project better!

tdooner commented 8 years ago

Thanks for the review. I'll follow up with fixes for all your points soon.

As for handling errors in the event stream connection, you haven't missed anything. I can see two competing ways to implement that:

I'm leaning towards some basic reconnection logic, but not relying too much on them to handle every edge case, and instead preferring good error messages and process supervisors.

BrianHicks commented 8 years ago

I'd rather not crash the whole process on an internal error. We should try to reconnect. I would say the simplest logic would look like:

But when the program detects that a reconnect has successfully gone through, we should trigger a resync.

tdooner commented 8 years ago

All right! I just pushed some more changes; let me know how those look. I brought in go-version, tidied up some of the error processing and logging as you suggested, and put the /v2/events connection logic in an infinite loop with a 10 second timeout between reconnection attempts. This worked pretty well in my basic testing - restarting marathon and verifying that marathon-consul reconnected.

stevendborrelli commented 8 years ago

@tdooner thanks! Brian is currently out of the office for the holiday, I'd like him to do the review.

BrianHicks commented 8 years ago

@tdooner looks pretty great, thanks for adding failure handling. Just those two notes and we can get this merged.

tdooner commented 8 years ago

Got it!

BrianHicks commented 8 years ago

Woohoo!