cloudfoundry-attic / cf-abacus

CF usage metering and aggregation
Apache License 2.0
98 stars 86 forks source link

Distributed Tracing #329

Open rajkiranrbala opened 8 years ago

rajkiranrbala commented 8 years ago

We have six micro services(4 in pipieline and 2 plugins) involved for a usage record to be processed. In order to trace a request's lifetime in the pipeline we need to enable distributed tracing in our micro services.

cf-gitbot commented 8 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/120862389

The labels on this github issue will be updated when the story is started.

ghost commented 6 years ago

Hi @rajkiranrbala ,

Tracing is definitely something we need to look into and implement at some point in time, especially since Abacus is continuously growing in complexity and number of microservices.

Still, there are some problems and things to be considered.

Options

I took a look at what's available and OpenSource out there and it seems that Zipkin and Jaeger are the prominent choices.

Doing some experimentation with both, it seems that Jaeger is better. The Zipkin UI seems outdated and less user-friendly, it regularly forgets filter preferences, and has bugs. The development aspect of Jaeger is also better, where it, being based on opentracing standard, allows for much cleaner code.

Instrumentation

I did a quick PoC to see how it would look like. You can find the code here: https://github.com/SAP/cf-abacus/tree/tracing

Note that not everything is wired in that PoC, but the main usage flow (if you run npm run demo) produces a trace through collector, meter, accumulator, aggregator with some custom spans inside.

I found the following to be a bit problematic:

Deployment

I have not had a chance to look into this topic.

Nevertheless, Jaeger requires that there be an Agent running side-by-side to each of the Abacus microservices. We need to see how most easily to achieve that in the scope of Cloud Foundry.


If you have any remarks or experience on the topic, your feedback would be appreciated.

hsiliev commented 6 years ago

Another option here might be to use commercial solutions like Dynatrace / AppDynamics that can instrument the code during staging. This will give us the minimum tracing and we can later build custom spans on top.

We might want to add an abstraction layer to enable switch of tracing solutions.