Closed rasviitanen closed 3 years ago
Did you have this code panic? I don't think parallelism should matter here, as ExtensionsMut
is a wrapper around an RwLockWriteGuard
, which should already prevent competing thread access, although I can potentially see what you mean when we loop over the spans.
@Fishrock123 We did experience panics multiple times when testing our system under heavy load. Specifically this assertion here failed: https://tracing.rs/src/tracing_subscriber/registry/extensions.rs.html#86.
About the write guard, I don't necessarily think it protects against the issue here. What it locks is the span we are iterating on. This does not protect against having the same span in path
.
Consider the example again. Where one worker has spans [1, 2, 3] and one has [1, 2, 4]. In the code below I have provided comments about each worker's state:
// WORKER A:
// path = [1, 2]
let mut path = Vec::new();
// iter = [3]
for span_ref in iter {
// span_ref = 3
let mut write_guard = span_ref.extensions_mut();
// (write_guard protects span 3)
// elided
for span_ref in path.into_iter() {
// span_ref = 1
let mut write_guard = span_ref.extensions_mut();
write_guard.insert::<LazyTraceCtx<SpanId, TraceId>>(LazyTraceCtx(
// elided
));
}
}
// ---------------------------
// WORKER B:
// path = [1, 2]
let mut path = Vec::new();
// iterm = [4]
for span_ref in iter {
// span_ref = 4
let mut write_guard = span_ref.extensions_mut();
// (write_guard protects span 4)
// elided
for span_ref in path.into_iter() {
// span_ref = 1
let mut write_guard = span_ref.extensions_mut();
write_guard.insert::<LazyTraceCtx<SpanId, TraceId>>(LazyTraceCtx(
// elided
));
}
}
In the example above, as worker A is locking span 3 and worker B is locking span 4, both of them can be in the "for-loop" at the same time. It's enough that any of the spans in the path is shared to cause this panic.
Hmm, seems reasonable enough (and unfortunate). I'll try to get a release out later today.
Also, cross linking the original issue: https://github.com/inanna-malick/tracing-honeycomb/pull/14
Thank you! And thank you for the amazing work you have put into this crate!
We found another bug in libhoney that I thought you might be interested in/should be made aware of: https://github.com/nlopes/libhoney-rust/issues/74
released:
Previously
ExtensionsMut::insert
was used without a guarantee that we had exclusive access to a SpanRef's extensions. This could lead to a panic in theinsert
function in the case where the extension was inserted concurrently by multiple calls toeval_ctx
.By using
replace
the assertion is ignored and it no longer matters whether we have exclusive access or not to a span ineval_ctx
.Example
Suppose we have have two parallel calls A and B to
eval_ctx
:A is called with a span-tree with the following spans:
iter: [SpanId(1), SpanId(2), SpanId(3)]
B is called with a span-tree with the following spans:iter: [SpanId(1), SpanId(2), SpanId(4)]
Both A and B enters the for loop and we suppose that they both adds
Span(1)
andSpan(2)
to theirpath
.Then A continues to iterate over its spans and reaches
Span(3)
, which matches with the arm that eventually executes:Everything is OK so far.
Then B reaches the same loop once it proceeds to
SpanId(4)
. B then tries insert the extension forSpanId(1)
, but it is already inserted by A. This is forbidden andinsert
will panic.Solution
As
LazyTraceCtx
is private, there is no way for another Layer implementation to conflict with our extension, this means that we should be able to ignore the assertion ininsert
and callreplace
instead . It should be OK to replace old extensions as the replacement will be identical to what's already stored.The
replace
function does the same thinginsert
does but without the assertion (which we should be able to safely ignore): https://docs.rs/tracing-subscriber/0.2.15/src/tracing_subscriber/registry/extensions.rs.html#90-92