Open rajkiranrbala opened 8 years ago
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/120862389
The labels on this github issue will be updated when the story is started.
Hi @rajkiranrbala ,
Tracing is definitely something we need to look into and implement at some point in time, especially since Abacus is continuously growing in complexity and number of microservices.
Still, there are some problems and things to be considered.
I took a look at what's available and OpenSource out there and it seems that Zipkin
and Jaeger
are the prominent choices.
Doing some experimentation with both, it seems that Jaeger
is better. The Zipkin
UI seems outdated and less user-friendly, it regularly forgets filter preferences, and has bugs. The development aspect of Jaeger
is also better, where it, being based on opentracing
standard, allows for much cleaner code.
I did a quick PoC to see how it would look like. You can find the code here: https://github.com/SAP/cf-abacus/tree/tracing
Note that not everything is wired in that PoC, but the main usage flow (if you run npm run demo
) produces a trace through collector
, meter
, accumulator
, aggregator
with some custom spans inside.
I found the following to be a bit problematic:
context
object all around Abacus in order to support full tracing. I did try continuation-local-storage and similar solutions out there but they fail when it comes to generator-based async flows, and Abacus makes use of those. In the future we may also move to async/await
which is also not supported.fan-in
to work correctly in Jaeger.fan-in
flows.I have not had a chance to look into this topic.
Nevertheless, Jaeger
requires that there be an Agent running side-by-side to each of the Abacus microservices. We need to see how most easily to achieve that in the scope of Cloud Foundry.
If you have any remarks or experience on the topic, your feedback would be appreciated.
Another option here might be to use commercial solutions like Dynatrace / AppDynamics that can instrument the code during staging. This will give us the minimum tracing and we can later build custom spans on top.
We might want to add an abstraction layer to enable switch of tracing solutions.
We have six micro services(4 in pipieline and 2 plugins) involved for a usage record to be processed. In order to trace a request's lifetime in the pipeline we need to enable distributed tracing in our micro services.