csnemes / tracer

Tracing rewriter using Fody
Apache License 2.0
93 stars 26 forks source link

AsyncLocalContext is not matching expectations on TraceEnter/TraceLeave with some asynchronous code #81

Closed ndrwrbgs closed 4 years ago

ndrwrbgs commented 5 years ago

https://docs.microsoft.com/en-us/dotnet/api/system.runtime.compilerservices.compilergeneratedattribute?view=netframework-4.7.2

These are methods like MoveNext for IEnumerables with yield, lambdas that are compiled into methods, or async helper methods. Since these are a compiler implementation detail today rather than what the code says, it would probably make sense not to instrument them

ndrwrbgs commented 5 years ago

Minimum repro of problem (more for me than you, sorry :), relies on the Tracer.OpenTracing.Fody library

class Program
{
    [NoTrace]
    static void Main(string[] args)
    {
        GlobalTracer.Register(
            new TracerDecoratorBuilder(new MockTracer())
                .OnSpanActivated((span, name) => Console.WriteLine("Activate " + name + span.Context.SpanId))
                .OnSpanFinished((span, name) => Console.WriteLine("Finish " + name + span.Context.SpanId))
                //.OnSpanStarted((span, name) => Console.WriteLine("Start " + name))
                .Build());

        new Program().DoSomethingAsync().Wait();
    }

    private async Task DoSomethingAsync()
    {
        await Task.Run(async () =>
        {
            this.DoSomethingImpl().Wait();
        });
    }
    private async Task DoSomethingImpl()
    {
    }
}

There's something odd in that CompilerGenerated method when it captures this but not any locals (the case when it keeps the same method name too instead of the compiler generated one) that doesn't have AsyncCallContext being where the developer expects (it's probably a compiler generated method to handle the call context, actually...) such that we get, for the above code:

Activate Program.DoSomethingAsync()2
Activate Program.DoSomethingAsync()3
Activate Program.DoSomethingImpl()4
Finish Program.DoSomethingImpl()4
Finish Program.DoSomethingAsync()2 // <----- this should be 3, logically speaking

Unhandled Exception: System.AggregateException: One or more errors occurred. ---> System.InvalidOperationException: Tried to finish already finished span

I'm doubting it's trivial to fix, but avoiding instrumenting the CompilerGenerated method should side-step the problem.

ndrwrbgs commented 5 years ago

Update: This is definitely not going to work how I had hoped.

Thus far the most promising thing I've hacked is something like:

            if (new StackTrace().GetFrame(1).GetMethod().DeclaringType.FullName.Contains("b__")
            || new StackTrace().GetFrame(1).GetMethod().Name.Contains("b__"))
// Do not trace it

Obviously, without further investigation, that isn't going to be the right solution :)

csnemes commented 5 years ago

Actually it's necessary to instrument the compiler generated code in async case as the finish lies within the generated code and not in the original method. An async method returns at the first await call and the rest of the execution is called via the generated state machine class. So to keep the logical paring of entries and exits one should instrument that state machine.

ndrwrbgs commented 5 years ago

Yeah CompilerGenerated and AsyncStateMachine were not going to help with this. But there's still something about the context with b__ class/methods that isn't operating as a user would expect.

ndrwrbgs commented 5 years ago

@csnemes before I look into how the compiler/tracer interact here further, I was wondering if you had any insight on this particular item. I imagine for serilog and NLog it's not as large of a problem, since logging the enter twice wouldn't be detrimental in logging (as opposed to tracing) based setups, but perhaps you have encountered it and have an idea of how to work around it or solve it?

csnemes commented 5 years ago

I didn't have similar problems, so I assume it is some corner case? I don't know, I'll reproduce the problem based on your code and see why it happens.

csnemes commented 5 years ago

The reason you have two entries and exists is that there's a generated private async method for the lambda. So there are two methods and two state machines. So the first entry corresponds to entering to the method, the next one is entering into the lambda. I think the solution would be to skip the compiler generated method for the lambda,

ndrwrbgs commented 5 years ago

It also only happens when there is an implicit closure capture, from what I could tell in testing. Do you know of any good way we can detect the lambda? That was why I suggested the b__ or CompilerGenerated solutions previously, because I saw those patterns for the lambda, but obviously since both patterns match things we would want to instrument we can't use them.

csnemes commented 5 years ago

I added some tweeks to 3.1, so some of the cases will be handled properly.

ndrwrbgs commented 5 years ago

Thank you so much!! This fixes the test case above so I will mark this closed. I'll open a new issue if I encounter any further edge cases.

ndrwrbgs commented 5 years ago

Seems I didn't get a good enough repro case.

Updating the earlier example to the following exhibits the same issue (even in 3.1)


            new Program().Method_1().Wait();
        }

        internal async Task Method_1()
        {
            await Method_1_1();
            await Method_1_2();
        }

        internal async Task Method_1_2()
        {
            await Task.Delay(10);
        }

        internal async Task Method_1_1()
        {
            await Method_1_1_1();
        }

        internal async Task Method_1_1_1()
        {
            await Task.Delay(10);
        }
ndrwrbgs commented 4 years ago

Fixed (AFAICT) with the other issue, thanks again so much!