census-instrumentation / opencensus-erlang

A stats collection and distributed tracing framework
https://opencensus.io
Apache License 2.0
135 stars 40 forks source link

Modifying a span's start_time #157

Open waltfy opened 5 years ago

waltfy commented 5 years ago

Team, I have encountered a relevant issue with measuring spans for other libraries. Disclaimer: I'm a new comer to the Erlang/Elixir community, so please, if you could point me in the right direction — I'd hugely appreciate it!

Context: I want to trace the time taken to run each query Ecto has ran using :ocp.

The problem: Ecto's :telemetry usage is constrained to a single event: :query, this event is emitted upon query completion and includes details such as the query's duration. This in turn doesn't provide a feasible mechanism to create spans via telemetry events only. In other words, libraries don't always allow one to easily wrap their calls with ocp:with_child_span.

To further illustrate the problem, see the screenshot below:

Screenshot 2019-07-09 at 11 27 48

The image above shows a trace for a Phoenix request trace, that has a single query associated to it and relies on the telemetry :query event from Ecto: select pg_sleep(1); i.e. Postgres sleeps for 1 second before returning the query result.

The trace for elixir.duffel.web.air.airport.indexcontroller.index is in fact correct: ~1.076s. However, the trace for the query itself reports 54µs.

This happens because the telemetry event is emitted only at the query end, so we are forced to start a span and finish it immediately in order to get a trace.

Ideally we'd be able to properly create a span for the query, meaning that we'd have the actual query duration (~1s).

Ecto's principles suggest1 that the maintainers do not plan to emit other query related events and we can't modify a Span's "start_time".

What I'd like from you: My question would be: what is the library's standpoint for problems like this i.e. would you push back that Ecto should emit more events? Would you suggest that we pursue a more manual approach to tracing (i.e. wrapping the library's calls manually)?

Personally, I can see some value in being able to easily alter the span's start_time — but I would love any thoughts on the matter before I create a pull-request, as I can also imagine potential issues with this approach.

Thank you!


Footnotes

  1. Ecto's maintainers have mentioned provided some reasoning for emitting a single event:

    The rationale is that some libraries like Ecto only emit the "stop" event with the whole duration, which means you can track back, but not build it forward.

tsloughter commented 5 years ago

We will be supporting this and it is part of the upcoming OpenTelemetry spec (the merger of OpenCensus and OpenTracing). But I don't have an eta yet on when this will be supported, though it is probably simple enough that it could be added to OpenCensus so you don't have to wait for Opentelemetry.