c-cube / ocaml-trace

Common interface for tracing/instrumentation libraries in OCaml
https://c-cube.github.io/ocaml-trace/
22 stars 6 forks source link

add Trace_lwt.with_span for tracing synchronous code #17

Closed mattjbray closed 8 months ago

mattjbray commented 1 year ago

Add a custom Trace_lwt.with_span for synchronous code that looks up the parent from the implicit callback argument map.

The only difference from Trace_lwt.with_span_lwt is that with_span uses Fun.protect instead of Lwt.on_termination.

c-cube commented 1 year ago

Alright, so I'm not sure I like this. (I understand where it comes from ofc!)

My qualm is: in general, the hopeful outcome for trace is that many libraries get instrumented with it (just like they are with logs), even within a single organization. This instrumentation isn't collector-specific and, for some libraries, it'll use Trace.with_span as-is. This means that we must find a way to have synchronous with_span and asynchronous ones (like Trace_lwt.with_span_lwt which relies on normal Trace manual span) communicate properly.

The issue is at the boundary between async code (outer spans, living inside Lwt in this case) and sync code (inner spans, often algorithmic stuff in the cases we're interested in). For a given point, a given instant in a program's execution, there is at most one such boundary at a time (for a given thread):

[sync 3] <--- we are here, emitting a sync span
…
[sync 2] <--- first sync span
[sync 1]
[BOUNDARY HERE: regular sync call from inside a lwt-returning function]
[lwt 10]
[lwt 9] <--- last `Trace_lwt.with_span_lwt`
…
[lwt 2]
[lwt 1]
[main]

The challenge is that we may want sync 2 to be "attached" to the span from lwt 9 somehow. With the TEF collector it's mostly done implicitly via timestamps and thread ID (purely causal based on that). With other collectors such as OTEL we might need more explicit information about parentage (get span_id + trace_id from lwt 9 and attach that as a parent of sync 2). However this should be the job of the collector, and @ELLIOTTCABLE has been working hard on ocaml-ambient-context to properly use dynamic scoping/TLS/lwt storage to convey this information.

So I don't think we need this, but it's a tricky issue that's been here for a little while.

mattjbray commented 1 year ago

Ok, makes sense I think.

The challenge is that we may want sync 2 to be "attached" to the span from lwt 9 somehow.

That's what this PR is trying to solve. Without this, the sync 2 and lwt 9 spans are not related, and perfetto renders sync 2 under Thread 0.

However this should be the job of the collector

Isn't this what the wip-trace-lwt is doing though - using Lwt's internal implicit callback argument map? So is the wip-trace-lwt branch going to be replaced by the ocaml-ambient-context work?

c-cube commented 1 year ago

It might be that we don't need trace-lwt at all, following discussions with @ELLIOTTCABLE . A mere Trace.with_span would work (even though that keeps making my brain hurt a bit). Let's take a pause for now, but I'm not closing yet.

mattjbray commented 1 year ago

Ok I'm on the same page now. The problem is that if, from some Lwt code, we call a sync library that just uses Trace.with_span, then obviously the spans will not be linked. So while this works for the case where you want to trace some sync parts of your lwt application, this is not a general solution.

c-cube commented 1 year ago

Indeed. A good target to do instead is a version of trace-tef that uses either lwt storage or ambient context directly.

ELLIOTTCABLE commented 1 year ago

Ok I'm on the same page now. The problem is that if, from some Lwt code, we call a sync library that just uses Trace.with_span, then obviously the spans will not be linked. So while this works for the case where you want to trace some sync parts of your lwt application, this is not a general solution.

I'm not tracking; why does that case fail? (At least, "why does it fail, if you're using ambient-context" — it definitely won't work if your collector is trace-tef, today.)

Or, to rephrase that:

  1. either a collector uses ambient-context, in which case transdeps can, indeed, use Trace.with_span without having to themselves be concerned about Lwt or parent-span linking;
  2. … or a collector isn't using ambient-context, in which case Trace.with_span is presumably using stack-based (or wallclock-based, which is modulo threading) parent-span linking, and in which case transdeps' spans will be orphaned if they're invoked in an asynchronous context

So, based on the above messages, it feels like depending on ambient-context in pretty much every collector (unless it's subject to special constraints we haven't considered yet) is probably a good idea? I mean, I'm definitely biased, but ... :P

mattjbray commented 1 year ago

Yes, I wasn't up to speed with ambient-context and @c-cube filled me in offline.

From my understanding, this trace-lwt library from the wip-trace-lwt branch will not be merged, and trace-tef and other collects should use ambient-context to solve this.

c-cube commented 8 months ago

I don't think this is relevant anymore, let's close and revisit if there's new ideas.