Closed JonasKunz closed 2 months ago
I think we don't need to put too much effort into edge cases. The backend should just ignore inferred spans children which are in the stack above
I think we don't need to put too much effort into edge cases. The backend should just ignore inferred spans children which are in the stack above
Unfortunately it's not an edge case but the first think that occurred when running that feature with the OTel demo.
@AlexanderWert recently stumbled upon a problem where inferred-spans would cause a cyclic parent-child relationship in the generated trace. This cycle was formed because the
child_ids
of an inferred span would refer to a span which is actually a transitive parent of said inferred span.The problem occured with GRPC due to the root span being activated multiple times nested.
However, I think this uncovered a more general flaw in our current approach of assigning
child_ids
. We basically always assume that when a span A is activated replacing span B as the current span, that A would be a child of B. This is usually the case: most of the time a span represents a single method invocation and is activated exactly for the duration of that method call. However, concept wise the span parent-child relationship is actually completely independent of any activation!Consider the following example:
In this example,
spanA
is the parent ofspanB
when looking at the trace context. However, when looking at the stackframes we have:doWithB
is the root stack frame (and has spanB activated)doStuff
is the child frame ofdoWithB
doWithA
is the child ofdoStuff
(and has spanB activated)Our current inferred spans algorithm would now generate an inferred span for
doStuff
which would be a child ofspanB
and would havespanA
as a child (viachild_ids
). This is now a cyclic reference:spanB
is the child ofspanA
(per their construction trace context)doStuff
is a child ofspanB
spanA
is a child of doStuff (viachild_ids
)Taking a step back, why do we actually have
child_ids
? The problem is that we sometimes need to insert inferred spans as parents of already sent non-inferred spans. However, we should only do this if the parent-child relationship of the non-inferred spans actually corresponds to the parent-child relationship of the activations when looking at the stack frames! Because otherwise the parent-child relationship represents something different than the callstack and therefore we should not mess with it.An easy way to fix this is to only allow a non-inferred span to be placed in the
child_id
s of an inferred-span, if the tracecontext parent of the non-inferred is actually the parent of the inferred-span (or to be more precise: the first non-inferred transitive parent of the inferred span). This way, we will only introduce "middle-men" in the existing parent-child relationship of non-inferred spans and therefore are guaranteed to not introduce cycles.Note that while this approach will guarantee us to not introduce cycles, it still doesn't guarantee that the parent-child relationships still form a tree. Instead, it may decay into a directed, acyclic graph. Consider the following example:
This code will result in two inferred spans (
doStuff1
anddoStuff2
), where both will havespanB
in their child-Ids and both havespanA
as parent. IMO the only meaningful way of solving this is to ensure that a span referenced only at most once inchild_ids
. In theory we could use aSpanValue
for this, the problem is that we do the inference delayed and don't have access to the originally activated spans anymore. However, if https://github.com/async-profiler/async-profiler/issues/913 gets implemented, then we can do a live-stream based inference instead of spilling the span activations to disk. This would allow us to useSpanValues
to preserve the tree structure.