betfair / cougar

Cougar is a framework for making building network exposed service interfaces easy.
http://betfair.github.io/cougar
Apache License 2.0
27 stars 18 forks source link

Implement Zipkin Support #74

Closed andredasilvapinto closed 9 years ago

andredasilvapinto commented 10 years ago

Implement Zipkin ( http://twitter.github.io/zipkin/ ) support so we can have distributed tracing across all Cougar services inside the same infrastructure.

The implementation should follow Dapper's paper: http://research.google.com/pubs/pub36356.html

i.e. receive and propagate the trace headers to the underling services when necessary, and emit the relevant spans encapsulating those calls (server received, [client sent, client received]*, server sent).

andredasilvapinto commented 9 years ago

Even though I've already successfully tested the implementation in a Cougar service, I wasn't yet able to make the unit tests as I've being integrating the new Cougar clients in one of our closed-source platforms (Mantis) in order to have the client send/receive annotations. This is proving more difficult than I thought because I need to migrate all the clients and dependencies from Cougar 2 to 3: several tests broke (with -> set on DTOs, collections now need to be explicitly declared, packages have changed, Jackson has changed), some idds are not cougar 3 compatible (errors like having a field message on an Exception), and we also have 2 cougar helper libraries that I will need to make sure they are compatible with cougar 3.

As this is a closed-source platform, which I will lose access after leaving, I'm trying to do as much as possible there. I'm planing on finishing the Cougar implementation after. Some of my free time is also going to meetings, interviews and recruiter calls so I've not been able to do as much with it.

On Tue, Jan 27, 2015 at 7:45 AM, Simon Matic Langford < notifications@github.com> wrote:

Hiya How're you getting on? I would like to do a release candidate soon. Are you nearly ready to merge?

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-71602502.

andredasilvapinto commented 9 years ago

Still regarding the above message:

Is there an alternative in Cougar 3 to the Cougar 2 method of obtaining access to the transport metrics of an HTTP service client by using the execution venue object and an operation key of a service:

(((AbstractHttpExecutable)serviceRegisterableExecutionVenue.getExecutable(key)).getTransportMetrics()

In Cougar 3 there is no getExecutable method.

This is needed for our Circuit Breaker implementation.

andredasilvapinto commented 9 years ago

So, I've tried a different approach (uglier but much easier and quicker than upgrading everything to Cougar 3 - there is also some code and other libraries that are depending on Jetty 7 constructions) in order to have this feature on Mantis. I've used AOP around the ExecutionVenue.execute method of Cougar 2 clients, wrapped the Observer for client receive emissions (I thought about using the pre/post processors but, as we know there is no easy way of propagating the Zipkin data from the pre to the post processor) and used the IdentityTokens in order to append the headers to the HTTP request (this is probably the ugliest part as the IdentityChainImpl has protected methods and I had to create a proxy class on its package).

So basically it is working, but it is an hack. If Betfair wants to keep evolving Mantis, the best solution would be to upgrade the clients to Cougar 3 with all the work that would imply. It would take a while, but that kind of work would be necessary in any case when upgrading Jetty or Jackson to their current versions. Jetty 7 is a blatant case, as it was even already deprecated by their maintainers at the end of last year. There is simply no open-source support now and if Betfair doesn't want to be in a fragile position when the next bugs or security problems come out, it should start to seriously consider upgrading to Jetty 9 (8 is also deprecated!).

https://webtide.com/jetty-7-and-jetty-8-end-of-life/

On a different note, I've just noticed that even though we discussed how we could obtain the port of an HTTP request on the protocol's resolver class, we still didn't solve that same question for the Socket protocol. Should we just use cougar.socket.serverport property?

eswdd commented 9 years ago

Yuck but i understand. Yes, for socket its safe as secure and insecure run on the same port (we use a starttls mechanism) so you dont have to be working out which it arrived on at runtime On 30 Jan 2015 23:43, "André Pinto" notifications@github.com wrote:

So, I've tried a different approach (uglier but much easier and quicker than upgrading everything to Cougar 3 - there is also some code and other libraries that are depending on Jetty 7 constructions) in order to have this feature on Mantis. I've used AOP around the ExecutionVenue.execute method of Cougar 2 clients, wrapped the Observer for client receive emissions (I thought about using the pre/post processors but, as we know there is no easy way of propagating the Zipkin data from the pre to the post processor) and used the IdentityTokens in order to append the headers to the HTTP request (this is probably the ugliest part as the IdentityChainImpl has protected methods and I had to create a proxy class on its package).

So basically it is working, but it is an hack. If Betfair wants to keep evolving Mantis, the best solution would be to upgrade the clients to Cougar 3 with all the work that would imply. It would take a while, but that kind of work would be necessary in any case when upgrading Jetty or Jackson to their current versions. Jetty 7 is a blatant case, as it was even already deprecated by their maintainers at the end of last year. There is simply no open-source support now and if Betfair doesn't want to be in a fragile position when the next bugs or security problems come out, it should start to seriously consider upgrading to Jetty 9 (8 is also deprecated!).

https://webtide.com/jetty-7-and-jetty-8-end-of-life/

On a different note, I've just noticed that even though we discussed how we could obtain the port of an HTTP request on the protocol's resolver class, we still didn't solve that same question for the Socket protocol. Should we just use cougar.socket.serverport property?

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-72290287.

eswdd commented 9 years ago

Just noticed this. Will look into why it was removed and see about reinstating. On 27 Jan 2015 09:50, "André Pinto" notifications@github.com wrote:

Still regrading the above message:

Is there an alternative in Cougar 3 to the Cougar 2 method of obtaining access to the transport metrics of a service by using the execution venue object and the operation key of a service:

serviceRegisterableExecutionVenue.getExecutable(key).getTransportMetrics()

In Cougar 3 there is no getExecutable method.

This is needed for our Circuit Breaker implementation.

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-71618092.

andredasilvapinto commented 9 years ago

So this is SSW (with the Mantis hack) running on my machine and pointing to a Cougar 3 with Zipkin SEAS instance also running on my machine:

Zipkin trace

The light blue lines are part of SEAS RPCs (in this case is just one to Facet). If all the services were using Zipkin, we would have something like that, but for all the lines (and the RPCs of those endpoints, and the RPCs of the endpoints of those RPCs...).

eswdd commented 9 years ago

Looks great. Zipkin porn!:) On 31 Jan 2015 00:15, "André Pinto" notifications@github.com wrote:

So this is SSW (with the Mantis hack) running on my machine and pointing to a Cougar 3 with Zipkin SEAS instance also running on my machine:

[image: Zipkin trace] https://camo.githubusercontent.com/90b92d0bb7ed427b9447e973a6c8db96218dc611/687474703a2f2f692e696d6775722e636f6d2f79747a796541712e706e67

The light blue lines are part of SEAS RPCs (in this case is just one to Facet). If all the services were using Zipkin, we would have something like that, but for all the lines (and the RPCs of those endpoints, and the RPCs of the endpoints of those RPCs...).

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-72293104.

eswdd commented 9 years ago

So this is all now merged in. I'm still waiting (I think) for gist doco.

Still not massively happy with module structure. Currently using zipkin in client forces inclusion of server side transports. Also config deps mean you can't include without socket or jetty transports. I think I'll break these out to seperate issues to be resolved before release.