Closed mattjbray closed 8 months ago
Alright, so I'm not sure I like this. (I understand where it comes from ofc!)
My qualm is: in general, the hopeful outcome for trace
is that many libraries get instrumented with it (just like they are with logs
), even within a single organization. This instrumentation isn't collector-specific and, for some libraries, it'll use Trace.with_span
as-is. This means that we must find a way to have synchronous with_span
and asynchronous ones (like Trace_lwt.with_span_lwt
which relies on normal Trace
manual span) communicate properly.
The issue is at the boundary between async code (outer spans, living inside Lwt in this case) and sync code (inner spans, often algorithmic stuff in the cases we're interested in). For a given point, a given instant in a program's execution, there is at most one such boundary at a time (for a given thread):
[sync 3] <--- we are here, emitting a sync span
…
[sync 2] <--- first sync span
[sync 1]
[BOUNDARY HERE: regular sync call from inside a lwt-returning function]
[lwt 10]
[lwt 9] <--- last `Trace_lwt.with_span_lwt`
…
[lwt 2]
[lwt 1]
[main]
The challenge is that we may want sync 2
to be "attached" to the span from lwt 9
somehow. With the TEF
collector it's mostly done implicitly via timestamps and thread ID (purely causal based on that). With other collectors such as OTEL we might need more explicit information about parentage (get span_id
+ trace_id
from lwt 9
and attach that as a parent of sync 2
). However this should be the job of the collector, and @ELLIOTTCABLE has been working hard on ocaml-ambient-context
to properly use dynamic scoping/TLS/lwt storage to convey this information.
So I don't think we need this, but it's a tricky issue that's been here for a little while.
Ok, makes sense I think.
The challenge is that we may want
sync 2
to be "attached" to the span fromlwt 9
somehow.
That's what this PR is trying to solve. Without this, the sync 2
and lwt 9
spans are not related, and perfetto renders sync 2
under Thread 0
.
However this should be the job of the collector
Isn't this what the wip-trace-lwt
is doing though - using Lwt's internal implicit callback argument map? So is the wip-trace-lwt
branch going to be replaced by the ocaml-ambient-context
work?
It might be that we don't need trace-lwt
at all, following discussions with @ELLIOTTCABLE . A mere Trace.with_span
would work (even though that keeps making my brain hurt a bit). Let's take a pause for now, but I'm not closing yet.
Ok I'm on the same page now. The problem is that if, from some Lwt code, we call a sync library that just uses Trace.with_span
, then obviously the spans will not be linked. So while this works for the case where you want to trace some sync parts of your lwt application, this is not a general solution.
Indeed. A good target to do instead is a version of trace-tef that uses either lwt storage or ambient context directly.
Ok I'm on the same page now. The problem is that if, from some Lwt code, we call a sync library that just uses
Trace.with_span
, then obviously the spans will not be linked. So while this works for the case where you want to trace some sync parts of your lwt application, this is not a general solution.
I'm not tracking; why does that case fail? (At least, "why does it fail, if you're using ambient-context
" — it definitely won't work if your collector is trace-tef
, today.)
Or, to rephrase that:
ambient-context
, in which case transdeps can, indeed, use Trace.with_span
without having to themselves be concerned about Lwt or parent-span linking;ambient-context
, in which case Trace.with_span
is presumably using stack-based (or wallclock-based, which is modulo threading) parent-span linking, and in which case transdeps' spans will be orphaned if they're invoked in an asynchronous contextSo, based on the above messages, it feels like depending on ambient-context
in pretty much every collector (unless it's subject to special constraints we haven't considered yet) is probably a good idea? I mean, I'm definitely biased, but ... :P
Yes, I wasn't up to speed with ambient-context
and @c-cube filled me in offline.
From my understanding, this trace-lwt
library from the wip-trace-lwt
branch will not be merged, and trace-tef
and other collects should use ambient-context
to solve this.
I don't think this is relevant anymore, let's close and revisit if there's new ideas.
Add a custom
Trace_lwt.with_span
for synchronous code that looks up the parent from the implicit callback argument map.The only difference from
Trace_lwt.with_span_lwt
is thatwith_span
usesFun.protect
instead ofLwt.on_termination
.