Resolving stack traces is very slow in large applications (DebuggerMethodInfoTable is a poor performing hashtable)

noahfalk commented 6 years ago

DebuggerMethodInfoTable is based on CHashTable - a chaining hashtable that never rehashes as it grows, thus it has O(N) lookup performance rather than O(1) you would expect from a hashtable. I don’t know if there is some historical rationale for why this poorly scaling hashtable was used, but I do see comments in the code suggesting it is deprecated and that nothing else in the runtime should use it. The code declares 101 buckets, so average chain length would be num_methoddefs_loaded/101. It seems quite plausible that a big app would have 1M+ methods so an average chain size of 10,000. Not a fast lookup at all.

When doing stack trace resolution we lookup each frame in the stack in this table, all of the lookups are done synchronized under a debugger lock, and the entire operation is cooperative mode. This can cause large apps that frequently throw exceptions and log their stacks to have abnormally long GC pause times.

There is a related desktop bug (Microsoft access only): https://devdiv.visualstudio.com/DevDiv/Default/_workitems/edit/679712

UniverseHan commented 6 years ago

@mikem8361 I wanna try this. Could I ?

mikem8361 commented 6 years ago

I have no problem with it.

nonbob commented 2 years ago

I don't know how all the pieces fit together well enough to know if this counts as a real world example, but I'll present it briefly and you guys can decide if it's relevant.

We captured a trace of a largish applicaton with vsdiagnostics.exe. When examining the resulting .diagsession file in VS, the next-to-top hit for Self CPU was DebuggerMethodInfoTable::Cmp (top was memset, 3rd was JIT_ByRefWriteBarrier, then it got into application code). Which is how I got here. Take if it fits, ignore it if it don't.

(Perhaps I should note that AppDynamics and Stackify were also present on this system. Not sure if that matters or not)

DeepakRajendrakumaran commented 10 months ago

I've seen this same thing when looking at VTune profiles for some real world apps.

My main question is - is this a profiler only method? i.e., will this be called when profiler is not attached?

hoyosjs commented 10 months ago

No, there are some cases in which you will see this at runtime. Particularly symbolication of an exception might hit this. Other than that - diagnostic scenarios are the main perf impact

DeepakRajendrakumaran commented 10 months ago

No, there are some cases in which you will see this at runtime. Particularly symbolication of an exception might hit this. Other than that - diagnostic scenarios are the main perf impact

Thanks for the response. I was trying to figure out if this code will be hit during 'normal running of an application'- i.e., will optimizing it have any effect if application is running as expected(gauging importance of optimizing it). 'symbolization of an exception' I assume is something that needs to happen if there is a need to get call stack as a result of encountering an exception which I assume is a rare occurrence. Please feel free to correct me if I am wrong.

DeepakRajendrakumaran commented 9 months ago

No, there are some cases in which you will see this at runtime. Particularly symbolication of an exception might hit this. Other than that - diagnostic scenarios are the main perf impact

Thanks for the response. I was trying to figure out if this code will be hit during 'normal running of an application'- i.e., will optimizing it have any effect if application is running as expected(gauging importance of optimizing it). 'symbolization of an exception' I assume is something that needs to happen if there is a need to get call stack as a result of encountering an exception which I assume is a rare occurrence. Please feel free to correct me if I am wrong.

@hoyosjs @cshung @mikem8361 @tannergooding @BruceForstall

Sorry to bother you guys again and my apologies if I missed anything from the original reply but it would be nice to get a confirmation re above questions. I'm not really familiar with diagnostics part of runtime and would appreciate some feedback here

Will suboptimal implementation in 'DebuggerMethodInfoTable' have any effect on performance during regular execution - i.e., it's not a factor unless there is a profiler attached, exception causing runtime to figure out call stack or any other unusual events
is CHashTable used else where or is it specific to 'DebuggerMethodInfoTable'. This is the other one that pops in the profile I have. The issue here is I see some comments there which indicates this can be further optimized but I'm trying to figure out the impact on real-world and it's hard to make that determination from profile unless I know which functions in there are showing up specifically due to the profiler being attached Edit : Adding what seem like a relevant issue that's planned to be worked on : https://github.com/dotnet/runtime/issues/11466

hoyosjs commented 9 months ago

It's used elsewhere, and it's brittle to change CHashTable. A better thing is to move to SHash. And it's more than a profiler scenario. A common pattern is something like

try
{
  throw new Exception();
}
catch (Exception ex)
{
  logger.Warn("Operation failed with exception: {exception}", ex);
}

DeepakRajendrakumaran commented 9 months ago

It's used elsewhere, and it's brittle to change CHashTable. A better thing is to move to SHash. And it's more than a profiler scenario. A common pattern is something like
try
{
  throw new Exception();
}
catch (Exception ex)
{
  logger.Warn("Operation failed with exception: {exception}", ex);
}

Thank you. I see a separate issue for this that's tagged for 9.0 (https://github.com/dotnet/runtime/issues/11466). Would I be correct in assuming this is planned to be worked on for the next release?

davidwrighton commented 8 months ago

Yes, we plan to work on this next release. We may or may not fix it by fixing the hash tables in the debugger.

This is a small test app which shows the problem.

using System.Diagnostics;
using System.Reflection;
using System.Reflection.Emit;
using System.Runtime.CompilerServices;

Console.WriteLine("EHToStringPerfTest.exe <numberOfDifferentMethodsToThrowThroughInWarmup> <numIterations> <stackDepth> <ILComplexity> <numberOfThreads> <runWarmupMethodsBeforeIterationsWarmup>");

int numberOfDifferentMethodsToThrowThroughInWarmup = args.Length > 0 ? Int32.Parse(args[0]) : 10000;
int numIterations = args.Length > 1 ? Int32.Parse(args[1]) : 1000;
int stackDepth = args.Length > 2 ? Int32.Parse(args[2]) : 100;
int ILComplexity = args.Length > 3 ? Int32.Parse(args[3]) : 500;
int numberOfThreads = args.Length > 4 ? Int32.Parse(args[4]) : 4;
bool runWarmupMethodsBeforeIterationsWarmup = args.Length > 5 ? bool.Parse(args[5]) : false;

Type overflow = typeof(OverflowException);
ConstructorInfo exCtorInfo = typeof(Exception).GetConstructor(Array.Empty<Type>());
MethodInfo exToString = typeof(object).GetMethod("ToString");
MethodInfo dummyMethod = typeof(TempMethodHolder).GetMethod("DummyMethod");
MethodInfo throwingMethod = typeof(TempMethodHolder).GetMethod("ThrowingMethod");

AssemblyBuilder assemblyBuilder = AssemblyBuilder.DefineDynamicAssembly(new System.Reflection.AssemblyName("Temp"), AssemblyBuilderAccess.Run);
var moduleBuilder = assemblyBuilder.DefineDynamicModule("Temp");
var typeBuilder = moduleBuilder.DefineType("TheType", System.Reflection.TypeAttributes.Public, typeof(Object));

for (int i = 0; i < numberOfDifferentMethodsToThrowThroughInWarmup; i++)
    AddMethodToType(i);
void AddMethodToType(int index)
{
    var methodBuilder = typeBuilder.DefineMethod($"Method{index}", System.Reflection.MethodAttributes.Public | System.Reflection.MethodAttributes.Static, typeof(void), Array.Empty<Type>());
    var ilg = methodBuilder.GetILGenerator();
    ilg.Emit(OpCodes.Nop);
    ilg.BeginExceptionBlock();
    ilg.Emit(OpCodes.Newobj, exCtorInfo);
    ilg.Emit(OpCodes.Throw);
    ilg.BeginCatchBlock(typeof(Exception));
    ilg.Emit(OpCodes.Callvirt, exToString);
    ilg.Emit(OpCodes.Pop);
    ilg.EndExceptionBlock();
    ilg.Emit(OpCodes.Ret);
}

MethodInfo lastMethod = throwingMethod;
for (int i = 0; i < stackDepth; i++)
{
    lastMethod = AddNestingMethodToType(lastMethod, i);
}

MethodInfo AddNestingMethodToType(MethodInfo methodToCall, int index)
{
    var methodBuilder = typeBuilder.DefineMethod($"NestingMethod{index}", System.Reflection.MethodAttributes.Public | System.Reflection.MethodAttributes.Static, typeof(void), Array.Empty<Type>());
    methodBuilder.SetImplementationFlags(MethodImplAttributes.NoInlining);
    var ilg = methodBuilder.GetILGenerator();
    ilg.Emit(OpCodes.Nop);
    for (int ilComplexityIndex = 0; ilComplexityIndex < ILComplexity; ilComplexityIndex++)
        ilg.Emit(OpCodes.Call, dummyMethod);
    ilg.Emit(OpCodes.Call, methodToCall);
    ilg.Emit(OpCodes.Ret);
    return methodBuilder;
}

var createdType = typeBuilder.CreateType();
Action CoreIteration = createdType.GetMethod(lastMethod.Name).CreateDelegate<Action>();

int numIterationsInThreadOfTesting = 2;
Warmup();
numIterationsInThreadOfTesting = numIterations;
var milliseconds = TestRoutine(numberOfThreads);

Console.WriteLine($"Milliseconds for test: {milliseconds}");
Console.WriteLine($"Exceptions/second (overall): {(double)(numIterations * numberOfThreads * 1000) / (double)milliseconds}");
Console.WriteLine($"Exceptions/second (per thread): {(double)(numIterations * 1000) / (double)milliseconds}");

void ThreadOfTesting()
{
    for (int i = 0; i < numIterationsInThreadOfTesting; i++)
        Iteration();
}

void Warmup()
{
    if (runWarmupMethodsBeforeIterationsWarmup)
        InvokeWarmupMethods();
    Iteration();
    if (!runWarmupMethodsBeforeIterationsWarmup)
        InvokeWarmupMethods();

    TestRoutine(1);
}

void InvokeWarmupMethods()
{
    foreach (var method in createdType.GetMethods())
    {
        if (method.Name.StartsWith("Method"))
            method.Invoke(null, Array.Empty<object>());
    }
}

long TestRoutine(int numThreads)
{
    Stopwatch sw = Stopwatch.StartNew();
    List<Task> tasks = new List<Task>();

    for (int i = 0; i < numberOfThreads; i++)
    {
        tasks.Add(Task.Run(ThreadOfTesting));
    }

    foreach (Task t in tasks)
    {
        t.Wait();
    }
    return sw.ElapsedMilliseconds;
}

string exToStringResult;
void Iteration()
{
    try
    {
        CoreIteration();
    }
    catch (Exception ex) { exToStringResult = ex.ToString(); }
}

public class TempMethodHolder
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void DummyMethod()
    {
        return;
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void ThrowingMethod()
    {
        throw new Exception();
    }
}

DeepakRajendrakumaran commented 8 months ago

Thanks for the test code. I'll check it out locally.

I'm not going to attempt to fix it myself seeing that you guys already are planning to do this.

dotnet / runtime

Resolving stack traces is very slow in large applications (DebuggerMethodInfoTable is a poor performing hashtable) #11056