davidbarsky / tracing-tree

Apache License 2.0
123 stars 31 forks source link

Improvement: better concurrency support of interleaved futures #60

Closed ten3roberts closed 10 months ago

ten3roberts commented 1 year ago

Currently, spans are printed when they are created, exited, and when a child span is opened depending on configuration.

However, if you have concurrently running futures which are instrumented (which easily occurs in larger applications) there is no way to tell which span the event belongs to.

Printing the span again when switching span parents, I.e, when a span another span is entered would solve this issue.

Example:

use std::time::Duration;
use tokio::join;
use tracing_subscriber::prelude::*;
use tracing_tree::HierarchicalLayer;

#[tokio::main]
async fn main() {
    tracing_subscriber::registry()
        .with(
            HierarchicalLayer::new(4)
                .with_verbose_entry(true)
                .with_verbose_exit(true)
                .with_indent_lines(true),
        )
        .init();
    let task1 = tokio::spawn(foo());
    tokio::time::sleep(Duration::from_millis(1700)).await;
    let task2 = tokio::spawn(bar());

    let _ = join!(task1, task2);
}

#[tracing::instrument(level = "info")]
async fn foo() {
    loop {
        tokio::time::sleep(Duration::from_millis(500)).await;
        tracing::info!("Foo");
    }
}

#[tracing::instrument(level = "info")]
async fn bar() {
    loop {
        tokio::time::sleep(Duration::from_millis(300)).await;
        tracing::info!("Bar");
    }
}

Output:

┐foo 
├───501ms  INFO Foo
├───1004ms  INFO Foo
├───1507ms  INFO Foo
┐bar 
├───302ms  INFO Bar
├───2008ms  INFO Foo
├───604ms  INFO Bar
├───2510ms  INFO Foo
├───907ms  INFO Bar
├───1208ms  INFO Bar
├───3013ms  INFO Foo
├───1511ms  INFO Bar
├───1813ms  INFO Bar
├───3515ms  INFO Foo
├───2116ms  INFO Bar
├───4017ms  INFO Foo

The output makes it look like Foo is printed from the bar span

oli-obk commented 1 year ago

I'm guessing this is a duplicate of https://github.com/davidbarsky/tracing-tree/issues/50

There are solutions to this, but I think tracing-tree is fundamentally the wrong tool for it.

ten3roberts commented 1 year ago

Thank you. Didn't to find that. Github search is quite bad when it comes to subword matches

ten3roberts commented 1 year ago

The suggested solution of shifting the events after the fact isn't the same one, nor it is one that works well with a feed-style terminal or file.

Juts printing the span anew would be a better solution

oli-obk commented 1 year ago

Yea, I only found it by going through all closed issues by hand 😆

Yea, the problem is that you'll get random openings and closings of the span when a future gets polled but returns immediately without doing any work. We can try it out in a branch and you can look whether it's acceptable to you. Then we can figure out how to support that nicely

oli-obk commented 1 year ago

Ah found out why it is so familiar: https://github.com/davidbarsky/tracing-tree/issues/25

ten3roberts commented 1 year ago

I will be more than happy to implement it if you want to.

I think the solution for this is deferring on_open to when an event comes, so that if there is no tracing info or alike between awaits then the span is not opened and reprinted, which also prevents "leaking" internal structure of an async function, as await points may change and thus should not modify externally observable behavior such as span re-opening as yielding is itself not externally observable (outside the executor).

Regarding #25 this is something I've been wanting a lot as well. It would be great if spans aren't printed unless there is some event within the span. Entering a function shouldn't print something as my main use case for spans are to figure out from where a tracing event occurred and the context of it, not that a function was called, that's what I would use an event for.

The tracing_subscriber::fmt::Layer does it like that. Spans only exist to show what lead up to an event, but do not stand on their own, so instrumenting a function that doesn't print anything won't spam the console with useless "handle_connection". Though the issue with fmt is that the span backtrace is printed for each event like a path, which makes it very verbose, and also impossible to tell if two with the same "path" were from the same invocation, or two different functions calls one after another.

oli-obk commented 1 year ago

We can certainly experiment. I think the first step should be a test reproducing the issue in our CI (maybe with a really simple future handler that just polls a bunch of futures in a loop). This way we have an example of the issue that will show us when the behaviour changes

ten3roberts commented 1 year ago

That sounds likea good idea.

I am currently trying to move over Ambient to use tracing rather than log and would prefer tracing tree as it is easier to see the context and groupings of events rather than just a path.

davidbarsky commented 1 year ago

Printing the span again when switching span parents, I.e, when a span another span is entered would solve this issue.

That's feels like an approach worth playing with and might make this suck less for concurrent execution. That being said, you might be better off using tracing_subscriber' default formatter for async/concurrent stuff.

ten3roberts commented 1 year ago

Yes, tracing-subscriber::fmt is better for concurrency as it uses an atomic logging approach, though tracing-tree is far better to understand context and the relations between events (was it two events from the same function/span or was called twice etc).

It is thus such a shame that this formatter breaks down when multiple things run at once, which is sortof the point of using async.

I am currently wrapping up another quite big PR regarding Webtransport, but once that is done I'll more than happily see what I can do with this. 😊

ten3roberts commented 1 year ago

Do you have any preferences as to how to test this?

Is it sufficient to setup a writer to a string and compare that?

oli-obk commented 1 year ago

We can't test it with true concurrency, but a single threaded execution of multiple futures should give us something testable I think. This way we can use our ui test suite. Just adding a new example to the examples directory will make the test suite run the tests. You can generate the output file (that you need to commit to git) with cargo test -- -- --bless (yes a mouth full, I'm working on it).

ten3roberts commented 1 year ago

That is the least mouthful test bootstrapping I've seen 😅

I've begun to setup test cases for this now. Pardon for taking some time to get to it

Relevant: https://github.com/tokio-rs/tracing#in-asynchronous-code

ten3roberts commented 1 year ago

The suggestions I have for the api is

These features together would significantly improve my use case of tracing-tree and will allow me to use it rather than tracing_subscriber::fmt which I am currently using, but lacks the aforementioned contextualiazation tracing-tree offers.

If you find a better name for these things I am more than open to the suggestion 😊

Would you want to add these features? If so I'll include them in #64 if you approve of it

oli-obk commented 1 year ago

I don't know how we could do with_lazy_spans. It would at least require a major rewrite in how we print things I believe, but maybe I'm overlooking a simpler solution?

davidbarsky commented 1 year ago

I don't know how we could do .with_lazy_spans. It would at least require a major rewrite in how we print things I believe, but maybe I'm overlooking a simpler solution?

The trick that tracing_subscriber::fmt took was "treat printing span events as an event": https://github.com/tokio-rs/tracing/blob/c6e8a8f679f40d2528008918e2d636e0b48b8107/tracing-subscriber/src/fmt/fmt_subscriber.rs#L825.

ten3roberts commented 1 year ago

I don't know how we could do .with_lazy_spans. It would at least require a major rewrite in how we print things I believe, but maybe I'm overlooking a simpler solution?

The trick that tracing_subscriber::fmt took was "treat printing span events as an event": https://github.com/tokio-rs/tracing/blob/c6e8a8f679f40d2528008918e2d636e0b48b8107/tracing-subscriber/src/fmt/fmt_subscriber.rs#L825.

That's only for the verbose entry setting right?

Regarding this, I do understand as an opem source contributor and maintainer that large rewrites are absolutely daunting and way to complex to handle in a PR in conjunction with other features.

I'll just see if I find a way to do it mostly additively or contain few changes for existing code when disabled and do it if that turns out to be the case.

davidbarsky commented 1 year ago

I don't know how we could do .with_lazy_spans. It would at least require a major rewrite in how we print things I believe, but maybe I'm overlooking a simpler solution?

The trick that tracing_subscriber::fmt took was "treat printing span events as an event": https://github.com/tokio-rs/tracing/blob/c6e8a8f679f40d2528008918e2d636e0b48b8107/tracing-subscriber/src/fmt/fmt_subscriber.rs#L825.

That's only for the verbose entry setting right?

Regarding this, I do understand as an opem source contributor and maintainer that large rewrites are absolutely daunting and way to complex to handle in a PR in conjunction with other features.

I'll just see if I find a way to do it mostly additively or contain few changes for existing code when disabled and do it if that turns out to be the case.

I think it is also applicable to the lazy, tracing_subscriber::fmt-style behavior as well. It's been a minute since I looked closely at tracing-tree, so I might be misremembering a few things, but I suspect this sort of change might be desirable and not too hard (famous last words!) to make. Feel free to give it a spin, if you have the time and/or bandwidth.

safinaskar commented 10 months ago

Interleaved futures still don't work properly. If you want, I can write proper bug report

davidbarsky commented 10 months ago

Interleaved futures still don't work properly. If you want, I can write proper bug report

Please do!

RalfJung commented 6 months ago

Interleaved futures still don't work properly. If you want, I can write proper bug report

Has this happened? We are also having trouble with interleaved scopes for concurrent logging, though they are not future-related.

ten3roberts commented 6 months ago

Interleaved futures still don't work properly. If you want, I can write proper bug report

Has this happened? We are also having trouble with interleaved scopes for concurrent logging, though they are not future-related.

Hi Ralf,

I've fixed this in https://github.com/davidbarsky/tracing-tree/pull/78, and it is awaiting maintainer review.

May I ask which features/config you are using for tracing-tree, as this PR specifically target verbose_entry and non-deferred but retraced logging.

Feel free to have a look and comment the PR, extra eyes are greatly appreciated :grin:

RalfJung commented 6 months ago

We're using tracing-tree to log things in an interpreter that interprets multiple logical threads on a single host thread and switches between them. We use tracing scopes for each function call but tracing doesn't know about the threads so the scope information it shows, and the verbose entry/exit logging, are complete nonsense. (Cc https://github.com/rust-lang/miri/issues/2266)