futurewei-cloud / alcor

Alcor: Cloud native SDN platform powered by Kubernetes and Istio
MIT License
32 stars 33 forks source link

[Tracing] Support Jaeger tracing in NCM and TC #707

Closed zzxgzgz closed 2 years ago

zzxgzgz commented 2 years ago

What does this PR do:

  1. Added Jaeger tracing to NCM's gRPC server/client, so that we can use Jaeger to trace NCM's gRPC activities.
  2. Added Jaeger tracing to the Test Controller, to make the tracing more complete.
  3. Closes #708

Please refer to the following screenshot to get an idea of what to expect, when running Test Controller: GoalState Pushdown process: image

On-demand process: image The on-demand tracing was captured before merging #704 , after merging this PR, on-demand doesn't get triggered, as neigbhor states all push down before the pings.

Future improvements:

  1. Improve how the spans are organized when on-demand is triggered, the RequestGoalStates spans show be at the same level, but right now they are chained together. For the reason behind it and a possible solution, please refer to this issue for a possible solution.
  2. Jaeger tracing is also implemented in ACA, but the spans between ACA and NCM doesn't get connected, so the ACA's spans look like this: image we suspect that this is because we are using different libraries here. In ACA, jaeger tracing was adding using the opentelemetry-cpp library, however, NCM uses the opentracing library. We suspect this mismatch is causing the ACA spans not connected with the NCM spans. We may want to change to opentelemetry library in Alcor later.
lgtm-com[bot] commented 2 years ago

This pull request introduces 2 alerts when merging 4c99f65d8fb15b77c4459edb5ff2360c48415df1 into a8436227f3d3d459efbfbff0b9a5195c7ea07e5d - view on LGTM.com

new alerts:

xieus commented 2 years ago

@cj-chung Could you review this PR?

cj-chung commented 2 years ago

@cj-chung Could you review this PR?

sure