eaze / tracing-honeycomb

Fork of https://github.com/inanna-malick/tracing-honeycomb
MIT License
17 stars 19 forks source link

use replace instead of insert to add extension #3

Closed rasviitanen closed 3 years ago

rasviitanen commented 3 years ago

Previously ExtensionsMut::insert was used without a guarantee that we had exclusive access to a SpanRef's extensions. This could lead to a panic in the insert function in the case where the extension was inserted concurrently by multiple calls to eval_ctx.

By using replace the assertion is ignored and it no longer matters whether we have exclusive access or not to a span in eval_ctx.

Example

Suppose we have have two parallel calls A and B to eval_ctx:

A is called with a span-tree with the following spans: iter: [SpanId(1), SpanId(2), SpanId(3)] B is called with a span-tree with the following spans: iter: [SpanId(1), SpanId(2), SpanId(4)]

Both A and B enters the for loop and we suppose that they both adds Span(1) and Span(2) to their path.

Then A continues to iterate over its spans and reaches Span(3), which matches with the arm that eventually executes:

// path: [SpanId(1), SpanId(2)]
for span_ref in path.into_iter() {
    let mut write_guard = span_ref.extensions_mut();
    write_guard.insert::<LazyTraceCtx<SpanId, TraceId>>(LazyTraceCtx(
        TraceCtx {
            trace_id: local_trace_root.trace_id.clone(),
            parent_span: None,
        },
    ));
}

Everything is OK so far.

Then B reaches the same loop once it proceeds to SpanId(4). B then tries insert the extension for SpanId(1), but it is already inserted by A. This is forbidden and insert will panic.

Solution

As LazyTraceCtx is private, there is no way for another Layer implementation to conflict with our extension, this means that we should be able to ignore the assertion in insert and call replace instead . It should be OK to replace old extensions as the replacement will be identical to what's already stored.

The replace function does the same thing insert does but without the assertion (which we should be able to safely ignore): https://docs.rs/tracing-subscriber/0.2.15/src/tracing_subscriber/registry/extensions.rs.html#90-92

Fishrock123 commented 3 years ago

Did you have this code panic? I don't think parallelism should matter here, as ExtensionsMut is a wrapper around an RwLockWriteGuard, which should already prevent competing thread access, although I can potentially see what you mean when we loop over the spans.

rasviitanen commented 3 years ago

@Fishrock123 We did experience panics multiple times when testing our system under heavy load. Specifically this assertion here failed: https://tracing.rs/src/tracing_subscriber/registry/extensions.rs.html#86.

About the write guard, I don't necessarily think it protects against the issue here. What it locks is the span we are iterating on. This does not protect against having the same span in path.

Consider the example again. Where one worker has spans [1, 2, 3] and one has [1, 2, 4]. In the code below I have provided comments about each worker's state:

// WORKER A:
// path = [1, 2]
let mut path = Vec::new();

// iter = [3]
for span_ref in iter {
    // span_ref = 3

    let mut write_guard = span_ref.extensions_mut();
    // (write_guard protects span 3)

    // elided

    for span_ref in path.into_iter() {
        // span_ref = 1
        let mut write_guard = span_ref.extensions_mut();
        write_guard.insert::<LazyTraceCtx<SpanId, TraceId>>(LazyTraceCtx(
            // elided
        ));
    }
}
// ---------------------------
// WORKER B:
// path = [1, 2]
let mut path = Vec::new();

// iterm = [4]
for span_ref in iter {
    // span_ref = 4

    let mut write_guard = span_ref.extensions_mut();
     // (write_guard protects span 4)

    // elided

    for span_ref in path.into_iter() {
        // span_ref = 1
        let mut write_guard = span_ref.extensions_mut();
        write_guard.insert::<LazyTraceCtx<SpanId, TraceId>>(LazyTraceCtx(
            // elided
        ));
    }
}

In the example above, as worker A is locking span 3 and worker B is locking span 4, both of them can be in the "for-loop" at the same time. It's enough that any of the spans in the path is shared to cause this panic.

Fishrock123 commented 3 years ago

Hmm, seems reasonable enough (and unfortunate). I'll try to get a release out later today.


Also, cross linking the original issue: https://github.com/inanna-malick/tracing-honeycomb/pull/14

rasviitanen commented 3 years ago

Thank you! And thank you for the amazing work you have put into this crate!

We found another bug in libhoney that I thought you might be interested in/should be made aware of: https://github.com/nlopes/libhoney-rust/issues/74

Fishrock123 commented 3 years ago

released: