betfair / cougar

Cougar is a framework for making building network exposed service interfaces easy.
http://betfair.github.io/cougar
Apache License 2.0
27 stars 18 forks source link

Implement Zipkin Support #74

Closed andredasilvapinto closed 9 years ago

andredasilvapinto commented 10 years ago

Implement Zipkin ( http://twitter.github.io/zipkin/ ) support so we can have distributed tracing across all Cougar services inside the same infrastructure.

The implementation should follow Dapper's paper: http://research.google.com/pubs/pub36356.html

i.e. receive and propagate the trace headers to the underling services when necessary, and emit the relevant spans encapsulating those calls (server received, [client sent, client received]*, server sent).

eswdd commented 10 years ago

Depends on #59

Also approach discussed on the mailing list: https://groups.google.com/forum/#!topic/betfair-cougar/iwmtGIn4CPo

andredasilvapinto commented 10 years ago

I've just read both. Those discussions date from January and meanwhile Pedro Vilaça and Paulo Santos left Betfair. Have you made any progress with it yet? Do you plan on doing so? Do you have an estimate about when this would be implemented?

eswdd commented 10 years ago

Ah, I'd been waiting for them to provide, I was unaware that they'd left - a shame for Betfair.

ETA - no. #59 depends on #60 which I have started working on and anticipate having that done by the end of this week barring any probs. I would imagine if I do it all it would be done within a month or so, which given our next release is likely to be July/August would fit.

Will happily consider/review pull requests for any/all of #59 & #74, just let me know if you (or anyone else) is going to look at and then we won't duplicate work.

pmvilaca commented 10 years ago

Hi Simon,

Sorry for for the lack of communication but in fact we left Betfair. I don't have a lot of time to continue working on this but if you (Betfair) want to proceed with the development using public branches I can follow and comment it (based on the discussions that we had in the past).

Cheers, Pedro

eswdd commented 10 years ago

No worries. I might continue in my own time even if BF doesn't as I think it's a good feature to have and I'd like to gain some familiarity with Zipkin. I've also been playing with statsd and the opentsdb backend and will probably integrate that into Tornjak too.

andredasilvapinto commented 10 years ago

I'll be implementing Zipkin on Cougar 2 as that's the primary target right now (there is still no Cougar 3 service at Betfair as far as I know). Depending on how quickly that goes, how much time I have, how much relevant Cougar 3 becomes for Betfair, I might be able to help here after.

eswdd commented 10 years ago

If what you say is true then I'm somewhat surprise and slightly saddened. I was under the impression that the internal Cougar 2.x versions were only for bug fixing and that new services would be written on Cougar 3 as well as existing ones migrated - I know of at least one core Exchange service which was upgraded to a Cougar 3 release, but not sure if it has made it into production.

I would urge you to consider putting your efforts into Cougar 3 and urging others to do the same. The longer that everyone waits to upgrade, the harder that upgrade will become. There are many benefits to Cougar 3 over and above 2, and I fully documented the upgrade path before I left.

I will continue to implement #59 and #60 and hope that you will be able to contribute on top of that.

andredasilvapinto commented 10 years ago

I think this is not the right place to discuss the internal technological strategy for Betfair, but as you probably should understand, the upgrade decision for each service is not depending on me. And no, Cougar 2 updates are, de facto, not only for "bug fixes". There are new features being implemented on it, some of them are even already in production. I would obviously not be considering implementing Zipkin there if that was not the case.

eswdd commented 10 years ago

Agreed, thanks for the info.

andredasilvapinto commented 9 years ago

I've just committed what I've done till now, based on the design suggested on: https://gist.github.com/eswdd/44aba96d4ce46e89f79a

maybe that will help clarify some of the problems and questions I've reported on the gist.

eswdd commented 9 years ago

Thanks @andredasilvapinto - will probably be a little while before I can look at this as personal stuff is taking a lot of my time at the moment.

eswdd commented 9 years ago

Comments added on the commit, maybe a little disjoint as they were made over a week on 3 or 4 seperate train journeys..

andredasilvapinto commented 9 years ago

thanks for all the comments. I was not expecting such a detailed review as all the code is still in development. the intent was more to give an idea of how I'm trying to implement Zipkin by following your approach and more clearly explain some of the problems I'm facing in order to discuss how to solve them. In any case, the feedback is more than welcome. I'll review all of the items one by one on the commit page.

eswdd commented 9 years ago

I wasn't expecting to make a detailed review, but found myself thinking stuff as i read the code and thought it would be better to provide the feedback early. it helped me think through the code more by studying in enough detail to comment on it. in that case, ignore comments on unit tests and other things that would be done in a full impl.

andredasilvapinto commented 9 years ago

I have already fixed some of the issues you pointed out, here: https://github.com/andredasilvapinto/cougar/commit/65c981ac75578822f692b8a81f98aaf772757516

andredasilvapinto commented 9 years ago

I've just committed some more review fixes. https://github.com/andredasilvapinto/cougar/commit/cd9ac381fde1476378acbe0eed0f3d6768eec973

andredasilvapinto commented 9 years ago

I've finished reviewing your comments. If you don't mind taking a look at my replies, please do, as that will make us advance in reaching a final solution for the implementation.

eswdd commented 9 years ago

Ok, have added comments to most stuff where i feel there's something to be done, I think (assuming my example is self-explanatory and you agree) there is only one real decision/issue to make/resolve which will definittely require some change in cougar, namely around the span name:

There’s a choice to be made here, namely whether the name for batch calls should incorporate all the calls together (pipe delimited or somesuch) with a potential to blow any limits on the length of this data, or whether in fact the container call be treated as a separate request to the individual underlying service calls (might be quite nice). I’m guessing ZK doesn’t have any guidance in this regard?

eswdd commented 9 years ago

@richardqd something's come up in the process of implementing zipkin regarding batch calls (for now just json-rpc) and want to get your opinion on how to proceed.

Zipkin has been integrated by changing the implementation of RequestUUID resolved into a ZipkinRequestUUID which contains the raw cougar uuid and also some zipkin uuids. We've decided (and had tentatively confirmed on the zipkin mailing list as a good idea) that when a batch request is received we will emit a zipkin span (which each have a unique span id and reference to their parent) for the batch call itself and then a seperate one for each call within the batch (with the batch call as the parents). This allows us to see each call and it's effects seperately whilst still being able to see that they all came from one request.

In Cougar currently, we ensure that every request has a RequestUUID assigned (either by reading one from the request or by constructing a new one) and the same instance is used for all calls within the batch.

In Cougar 3.2+, RequestUUID are now compound uuids, so a request will now either have a uuid with root/parent/local components (because a complete uuid was passed in) or with root/local components (because an old uuid or no uuid was passed in and this is treated as a new root). The request/other logs contain the full compound request uuid.

Which leads to a decision on what to do with JSON-RPC requests. We can either:

I prefer the second option, it seems more representative of what's going on and it makes the code cleaner.

richardqd commented 9 years ago

I like the second option too - the universally applied hierarchy offers information in its own right

On 2 December 2014 at 09:30, Simon Matic Langford notifications@github.com wrote:

@richardqd https://github.com/richardqd something's come up in the process of implementing zipkin regarding batch calls (for now just json-rpc) and want to get your opinion on how to proceed.

Zipkin has been integrated by changing the implementation of RequestUUID resolved into a ZipkinRequestUUID which contains the raw cougar uuid and also some zipkin uuids. We've decided (and had tentatively confirmed on the zipkin mailing list as a good idea) that when a batch request is received we will emit a zipkin span (which each have a unique span id and reference to their parent) for the batch call itself and then a seperate one for each call within the batch (with the batch call as the parents). This allows us to see each call and it's effects seperately whilst still being able to see that they all came from one request.

In Cougar currently, we ensure that every request has a RequestUUID assigned (either by reading one from the request or by constructing a new one) and the same instance is used for all calls within the batch.

In Cougar 3.2+, RequestUUID are now compound uuids, so a request will now either have a uuid with root/parent/local components (because a complete uuid was passed in) or with root/local components (because an old uuid or no uuid was passed in and this is treated as a new root). The request/other logs contain the full compound request uuid.

Which leads to a decision on what to do with JSON-RPC requests. We can either:

  • Maintain the current functionality wrt core cougar RequestUUIDs, in which case we would have a difference in behaviour between zipkin and core uuids.
  • Make batch calls act the same as the plan for zipkin in which case the individual calls would have the uuid from the request as their parent, rather than as their local component.

I prefer the second option, it seems more representative of what's going on and it makes the code cleaner.

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-65203490.

andredasilvapinto commented 9 years ago

@eswdd I've seen that you committed the client emission stuff some days ago. Do you want me to integrate Zipkin with it already, or are you still working on it?

eswdd commented 9 years ago

No, its good to go. The tracer changes for batches and also the span name are in. Just need to do the in process calls but that wont change the api. Do you think we should have an explicit hook for zipkin in the trace spi when a client call is made? On 14 Dec 2014 17:18, "André Pinto" notifications@github.com wrote:

@eswdd https://github.com/eswdd I've seen that you committed the client emission stuff some days ago. Do you want me to integrate Zipkin with it already, or are you still working on it?

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-66920346.

andredasilvapinto commented 9 years ago

Hi Simon, sorry for the delay, but, for the reasons we know, and as I said before, I'm now working on Cougar 3 only on my spare time. I'm now on holidays, so I've got some time for finishing this.

What do you mean by a specific hook? Zipkin surely needs to be notified when a request is made, but I think that your new ContextEmitter might be able to do so. Were you thinking about anything else?

I've committed the update to ZipkinTracer that implements the small change in the Tracer interface ( https://github.com/andredasilvapinto/cougar/commit/086228b09993f0356232afa26c1465b52acd5a88 ). I went with operation.toString() for the zipkinSpanName, is that ok?

After finishing the client implementation, I'm planning on testing this locally with a Cougar 3 service, perform any needed bug fixing and then make the tests for the entire implementation.

eswdd commented 9 years ago

I thought that the caller needs to provide the span name to zipkin as well and it needs to match the one the server emits? Also the in-process calls im doing right now dont call a context emitter, they just pass on a new ececution context which wraps the old except it contains a sub uuid. So i think it might need an extra method "subCall(subUuid, opkey)" or similar so it all ties up? On 21 Dec 2014 17:40, "André Pinto" notifications@github.com wrote:

Hi Simon, sorry for the delay, but, for the reasons we know, and as I said before, I'm now working on Cougar 3 only on my spare time. I'm now on holidays, so I've got some time for finishing this.

What do you mean by a specific hook? Zipkin surely needs to be notified when a request is made, but I think that your new ContextEmitter might be able to do so. Were you thinking about anything else?

I've committed the update to ZipkinTracer that implements the small change in the Tracer interface ( andredasilvapinto@086228b https://github.com/andredasilvapinto/cougar/commit/086228b09993f0356232afa26c1465b52acd5a88 ). I went with operation.toString() for the zipkinSpanName, is that ok?

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-67777836.

eswdd commented 9 years ago

Oh and tostring on opkey sounds fine. On 21 Dec 2014 22:20, "Simon Matic Langford" simon@exemel.co.uk wrote:

I thought that the caller needs to provide the span name to zipkin as well and it needs to match the one the server emits? Also the in-process calls im doing right now dont call a context emitter, they just pass on a new ececution context which wraps the old except it contains a sub uuid. So i think it might need an extra method "subCall(subUuid, opkey)" or similar so it all ties up? On 21 Dec 2014 17:40, "André Pinto" notifications@github.com wrote:

Hi Simon, sorry for the delay, but, for the reasons we know, and as I said before, I'm now working on Cougar 3 only on my spare time. I'm now on holidays, so I've got some time for finishing this.

What do you mean by a specific hook? Zipkin surely needs to be notified when a request is made, but I think that your new ContextEmitter might be able to do so. Were you thinking about anything else?

I've committed the update to ZipkinTracer that implements the small change in the Tracer interface ( andredasilvapinto@086228b https://github.com/andredasilvapinto/cougar/commit/086228b09993f0356232afa26c1465b52acd5a88 ). I went with operation.toString() for the zipkinSpanName, is that ok?

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-67777836.

andredasilvapinto commented 9 years ago

You are right, the emitter needs to have access to (or create from what it receives) the span name. I thought you were thinking about using some property of the second argument to create the span name. But passing the operation key is cleaner, I agree.

Also, afaik, the span name doesn't need to be exactly the same as the server, because the match should certainly be done by the ids, but ideally I agree it should be coherent across the infrastructure (whenever possible).

What is your idea for adding the operation key then? Changing the emitter signature?

andredasilvapinto commented 9 years ago

So I've just committed a small diff in order to get this started

We need to pick a way to obtain the Zipkin span name ( https://github.com/andredasilvapinto/cougar/commit/b6b82805220f9ee72830225edb3acebc0233b540#diff-af2ffad6565bfb1ed9d44687a022948aR48 ) and we will also need a pointcut when we receive the response from the service we are calling in order to emit the client received annotation.

@eswdd how do you plan on doing this?

eswdd commented 9 years ago

i've just committed some changes which include a proper hook in the tracer spi for when a remote call has started (although it's not called from the http/socket clients yet). do we need one for when we get the response too?

andredasilvapinto commented 9 years ago

Great and Yes, that is the one to emit the client received annotation. On Jan 7, 2015 5:07 PM, "Simon Matic Langford" notifications@github.com wrote:

i've just committed some changes which include a proper hook in the tracer spi for when a remote call has started. do we need one for when we get the response too?

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-69054173.

eswdd commented 9 years ago

Ok will look at that when i do the client calls. On 7 Jan 2015 17:12, "André Pinto" notifications@github.com wrote:

Great and Yes, that is the one to emit the client received annotation. On Jan 7, 2015 5:07 PM, "Simon Matic Langford" notifications@github.com wrote:

i've just committed some changes which include a proper hook in the tracer spi for when a remote call has started. do we need one for when we get the response too?

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-69054173.

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-69055100.

andredasilvapinto commented 9 years ago

Just committed the update to the ZipkinTracer and the HttpEmitter. Is this the kind of implementation you are expecting?

andredasilvapinto commented 9 years ago

How do you think should be the integration of Zipkin on the Cougar Clients used by non-Cougar applications?

I think those applications should probably start using the ZipkinRequestUUID when they create their own ExecutionContext instead of the traditional RequestUUID. As ZipkinRequestUUID is compatible with RequestUUID, I guess we can use this approach for every Cougar Client (even though they may refer to old services - e.g. Cougar 2 - with no Zipkin support). Right?

I'm also expecting the generated Cougar client to create and emit the sub span/execution context along with the rpc span name every time it is used.

eswdd commented 9 years ago

The update to the ZipkinTracer/Emitter looks good, just one thing - would be best to reference the defaultHttpEmitter via the alias exposed in the cougar-client spring config - so we can change the internals if we want.

eswdd commented 9 years ago

The clients are interesting. Now that the client automatically creates a sub-uuid from the one passed in (on the assumption that noone ever called in with a new uuid - relatively safe assumption from what I used to see in the logs when I worked at Betfair), we need to make sure this behaviour is obvious when calling a client embedded in non-Cougar apps. Not sure how best to go about this.

Starting with a ZipkinRequestUUID seems sane, if the client has been configured with zipkin (ie the zipkin client module has been loaded) then the client will emit the headers, it would only emit the actual spans if there is a tracer implementation registered.

Clients can receive extra data just fine, so yes, calls to a Cougar 2 service from a Cougar 3 client is perfectly safe.

andredasilvapinto commented 9 years ago

The update to the ZipkinTracer/Emitter looks good, just one thing - would be best to reference the defaultHttpEmitter via the alias exposed in the cougar-client spring config - so we can change the internals if we want.

Fixed.

So now we just need to:

I will start by testing the current annotations (server receive and server send). If possible please reply here when you have the pointcuts for the client send and receive, so I can start implementing and testing those too.

andredasilvapinto commented 9 years ago

@eswdd I was trying to test Zipkin with a Cougar snapshot version on a Betfair service and I got several compiling errors related with the removed CougarLogger: https://github.com/betfair/cougar/commit/3af0b414ed31e9d7b116adae51c6d4d4cce9a463

is there a replacement in Cougar for it?

eswdd commented 9 years ago

Ok will do. Is nearly there on my laptop, just not had a seat on the train this week.. :( On 13 Jan 2015 18:10, "André Pinto" notifications@github.com wrote:

The update to the ZipkinTracer/Emitter looks good, just one thing - would be best to reference the defaultHttpEmitter via the alias exposed in the cougar-client spring config - so we can change the internals if we want.

Fixed.

So now we just need to:

  • wire the client send call
  • create and wire the new pointcut for the client receive
  • test and fix possible bugs
  • create unit tests for all of this

I will start by testing the current annotations (server receive and server send). If possible please reply here when you have the pointcuts for the client send and receive, so I can start implementing and testing those too.

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-69790489.

andredasilvapinto commented 9 years ago

Cool.

I've just remembered that I think we have not decided on the way to mark an entire request chain as "Do Not Trace":

Basically what we discussed here: https://gist.github.com/eswdd/44aba96d4ce46e89f79a#comment-1284296 https://gist.github.com/eswdd/44aba96d4ce46e89f79a#comment-1297131

Will we pick the new header approach or reuse the old one "X-Trace-Me"?

andredasilvapinto commented 9 years ago

@eswdd , not sure if you have seen this, so I'm bumping this again. Should we use "X-Trace-Me" or have a different header to mark an entire request chain as "Do Not Trace"?

bfmike commented 9 years ago

Personally I would use something new like X-Trace-Options and then you have somewhere to put other things that might come along in the future..

From: André Pinto [mailto:notifications@github.com] Sent: 16 January 2015 10:38 To: betfair/cougar Subject: Re: [cougar] Implement Zipkin Support (#74)

@eswddhttps://github.com/eswdd , not sure if you have seen this, so I'm bumping this again. Should we use "X-Trace-Me" or have a different header to mark an entire request chain as "Do Not Trace"?

— Reply to this email directly or view it on GitHubhttps://github.com/betfair/cougar/issues/74#issuecomment-70236092.


In order to protect our email recipients, Betfair Group use SkyScan from MessageLabs to scan all Incoming and Outgoing mail for viruses.


andredasilvapinto commented 9 years ago

So, I found some official documentation from Zipkin (added 10 days ago) which defines the standard headers to be used.

https://github.com/twitter/zipkin/blob/620d4f943fcaa053319e84053cd1a7e1f6332df6/doc/src/sphinx/Instrumenting.rst#communicating-trace-information

We can use other names if we want, as long as they are consisting across the entire infrastructure, but it would be advisable to use those, as they are now standard and used across several other Zipkin implementations (like Finagle, Django, Tryfer-node.js, Go-zipkin...), so I'm changing our headers to the default ones and adding the new:

The ids should also be hexadecimally encoded.

eswdd commented 9 years ago

Fine with me. The only niggle is the X-Trace-Me which we already use on older Cougar’s.. i suspect we should send both out, or at least send the X-B3-Sampled from the Zipkin context emitter. On the server, again, I suspect we should handle both to drive the ExecutionContext for compatibility with non-Zipkin tracing.

On 16 Jan 2015, at 15:26, André Pinto notifications@github.com wrote:

So, I found some official documentation from Zipkin (added 10 days ago) which defines the standard headers to be used.

https://github.com/twitter/zipkin/blob/620d4f943fcaa053319e84053cd1a7e1f6332df6/doc/src/sphinx/Instrumenting.rst#communicating-trace-information https://github.com/twitter/zipkin/blob/620d4f943fcaa053319e84053cd1a7e1f6332df6/doc/src/sphinx/Instrumenting.rst#communicating-trace-information We can use other names if we want, as long as they are consisting across the entire infrastructure, but it would be advisable to use those, as they are now standard and used across several other Zipkin implementations (like Finagle, Django, Tryfer-node.js, Go-zipkin...), so I'm changing our headers to the default ones and adding the new:

X-B3-Sampled: Boolean (either "1" or "0") - "Is Sampled" lets the downstream service know if it should record trace information for the request.

X-B3-Flags: a Long - "Flags" provide the ability to create and communicate feature flags. This is how we can tell downstream services that this is a "debug" request.

The ids should also be hexadecimally encoded.

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-70269134.

eswdd commented 9 years ago

BTW have just pushed the latest client code. Unit tests not done yet, but I wanted to expose the api changes so you can update your code. Simon

On 16 Jan 2015, at 22:10, Simon Matic Langford eswddd@gmail.com wrote:

Fine with me. The only niggle is the X-Trace-Me which we already use on older Cougar’s.. i suspect we should send both out, or at least send the X-B3-Sampled from the Zipkin context emitter. On the server, again, I suspect we should handle both to drive the ExecutionContext for compatibility with non-Zipkin tracing.

On 16 Jan 2015, at 15:26, André Pinto <notifications@github.com mailto:notifications@github.com> wrote:

So, I found some official documentation from Zipkin (added 10 days ago) which defines the standard headers to be used.

https://github.com/twitter/zipkin/blob/620d4f943fcaa053319e84053cd1a7e1f6332df6/doc/src/sphinx/Instrumenting.rst#communicating-trace-information https://github.com/twitter/zipkin/blob/620d4f943fcaa053319e84053cd1a7e1f6332df6/doc/src/sphinx/Instrumenting.rst#communicating-trace-information We can use other names if we want, as long as they are consisting across the entire infrastructure, but it would be advisable to use those, as they are now standard and used across several other Zipkin implementations (like Finagle, Django, Tryfer-node.js, Go-zipkin...), so I'm changing our headers to the default ones and adding the new:

X-B3-Sampled: Boolean (either "1" or "0") - "Is Sampled" lets the downstream service know if it should record trace information for the request.

X-B3-Flags: a Long - "Flags" provide the ability to create and communicate feature flags. This is how we can tell downstream services that this is a "debug" request.

The ids should also be hexadecimally encoded.

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-70269134.

andredasilvapinto commented 9 years ago

OK. Will do tomorrow. Thanks

On Fri, Jan 16, 2015 at 10:23 PM, Simon Matic Langford < notifications@github.com> wrote:

BTW have just pushed the latest client code. Unit tests not done yet, but I wanted to expose the api changes so you can update your code. Simon

On 16 Jan 2015, at 22:10, Simon Matic Langford eswddd@gmail.com wrote:

Fine with me. The only niggle is the X-Trace-Me which we already use on older Cougar’s.. i suspect we should send both out, or at least send the X-B3-Sampled from the Zipkin context emitter. On the server, again, I suspect we should handle both to drive the ExecutionContext for compatibility with non-Zipkin tracing.

On 16 Jan 2015, at 15:26, André Pinto <notifications@github.com mailto:notifications@github.com> wrote:

So, I found some official documentation from Zipkin (added 10 days ago) which defines the standard headers to be used.

https://github.com/twitter/zipkin/blob/620d4f943fcaa053319e84053cd1a7e1f6332df6/doc/src/sphinx/Instrumenting.rst#communicating-trace-information < https://github.com/twitter/zipkin/blob/620d4f943fcaa053319e84053cd1a7e1f6332df6/doc/src/sphinx/Instrumenting.rst#communicating-trace-information>

We can use other names if we want, as long as they are consisting across the entire infrastructure, but it would be advisable to use those, as they are now standard and used across several other Zipkin implementations (like Finagle, Django, Tryfer-node.js, Go-zipkin...), so I'm changing our headers to the default ones and adding the new:

X-B3-Sampled: Boolean (either "1" or "0") - "Is Sampled" lets the downstream service know if it should record trace information for the request.

X-B3-Flags: a Long - "Flags" provide the ability to create and communicate feature flags. This is how we can tell downstream services that this is a "debug" request.

The ids should also be hexadecimally encoded.

— Reply to this email directly or view it on GitHub < https://github.com/betfair/cougar/issues/74#issuecomment-70269134>.

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-70332391.

andredasilvapinto commented 9 years ago

Regarding the X-Trace-Me, I understand you want to keep it, but should it be independent from Zipkin? I think in our current architecture it probably makes sense to be so, as it is created and dealt with at a different layer than Zipkin. For instance, HttpContextEmitter already adds the X-Trace-Me header by itself, I don't think Zipkin should interfere with that.

andredasilvapinto commented 9 years ago

I've changed the ZipkinEmitter a bit in order to support binary annotations emission (i.e. key-value pairs). I've created methods for the Zipkin supported types of values: boolean, short, int, long, double, String.

I've also added the wiring for the client receive emissions.

Today, I will continue integrating the SNAPSHOT version in a Betfair service.

eswdd commented 9 years ago

Cool On 22 Jan 2015 09:24, "André Pinto" notifications@github.com wrote:

I've changed the ZipkinEmitter a bit in order to support binary annotations emission (i.e. key-value pairs). I've created methods for the Zipkin supported types of values: boolean, short, int, long, double, String.

I've also added the wiring for the client receive emissions.

Today, I will continue integrating the SNAPSHOT version into a Betfair service.

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-70992622.

andredasilvapinto commented 9 years ago

Just wanted to share some of the first Zipkin spans emitted from a Cougar service: a

:)

During the tests I've noticed that exceptions in the Zipkin classes break the request. I don't think this is desirable. We should just lose the Zipkin functionality. My question is, should the exception handling be done on the "generic Cougar" side (and apply e.g. to all ContextEmitters) or inside Zipkin. I think it should probably be done inside each Emitter/ComponentResolver, as some of those concrete classes may really want to break the entire request if they fail.

Also, in order to avoid exceptions on RequestUUIDs created manually by non-RPC originated tasks (e.g. background jobs) that use Cougar clients, I have replaced the exceptions that were being triggered on ZipkinHttpContextEmitter and ZipkinTracer when we received a non ZipkinRequestUUID, by a null return or simply ignoring the call. If we want to keep the exceptions we will have to force all background jobs to use ZipkinRequestUUIDs.

Finally I've also moved the localhost ip definition from ZipkinEmitter static initializer to the XML bean construction so the class becomes entirely Cougar independent (it is just a wrapper around Brave).

eswdd commented 9 years ago

Agree with your comments entirely. Looks good!

On 22 Jan 2015, at 16:29, André Pinto <notifications@github.com mailto:notifications@github.com> wrote:

Just wanted to share some of the first Zipkin spans emitted from a Cougar service: https://camo.githubusercontent.com/9c29ff4c49a6dbd11969846a4f6f84f2bb9a1843/687474703a2f2f692e696d6775722e636f6d2f52596576495a302e706e67 :)

During the tests I've noticed that exceptions in the Zipkin classes break the request. I don't think this is desirable. We should just lose the Zipkin functionality. My question is, should the exception handling be done on the "generic Cougar" side (and apply e.g. to all ContextEmitters) or inside Zipkin. I think it should probably be done inside each Emitter/ComponentResolver, as some of those concrete classes may really want to break the entire request if they fail.

Also, in order to avoid exceptions on RequestUUIDs created manually by non-RPC originated tasks (e.g. background jobs) that use Cougar clients, I have replaced the exceptions that were being triggered on ZipkinHttpContextEmitter https://github.com/andredasilvapinto/cougar/commit/6752e720fbf511e49bbdc59d08ea0a44c5bae645#diff-af2ffad6565bfb1ed9d44687a022948aL49 and ZipkinTracer https://github.com/andredasilvapinto/cougar/commit/6752e720fbf511e49bbdc59d08ea0a44c5bae645#diff-fc0b7318bf729cc3ce7768f77bf54915L102 when we received a non ZipkinRequestUUID, by a null return or simply ignoring the call. If we want to keep the exceptions we will have to force all background jobs to use ZipkinRequestUUIDs.

Finally I've also moved the localhost ip definition from ZipkinEmitter static initializer to the XML bean construction https://github.com/andredasilvapinto/cougar/commit/6752e720fbf511e49bbdc59d08ea0a44c5bae645#diff-6f76f58127cb3fe09ad6dac6c4b46ff6R21 so the class becomes entirely Cougar independent (it is just a wrapper around Brave).

— Reply to this email directly or view it on GitHub https://github.com/betfair/cougar/issues/74#issuecomment-71049080.

eswdd commented 9 years ago

Hiya How're you getting on? I would like to do a release candidate soon. Are you nearly ready to merge?