DataDog / dd-trace-dotnet

.NET Client Library for Datadog APM
https://docs.datadoghq.com/tracing/
Apache License 2.0
443 stars 138 forks source link

SqlCommand.ExecuteNonQueryAsync causes APPCRASH in 32-bit exes #662

Closed rlaveycal closed 4 years ago

rlaveycal commented 4 years ago

Describe the bug A .Net framework console application with the "Prefer 32-bit" build option set (which is the default) will crash if automatic tracing is enabled and SqlCommand.ExecuteNonQueryAsync is called. Running as a 64-bit process works.

To Reproduce

  1. Create a console application and add this code
            using (var connection = new SqlConnection("Data Source=localhost; Initial Catalog=master; Integrated Security=True;"))
            {
                connection.Open();
                using (var command = new SqlCommand("select @@version", connection)
                {
                    CommandType = CommandType.Text
                })
                {
                    Console.WriteLine("before");
                    var r = command.ExecuteNonQueryAsync().Result;
                    Console.WriteLine("after");
                }
            }
  2. Open a command prompt and enable tracing
    set COR_PROFILER={846F5F1C-F9AE-4B07-969E-05C26BC060D8}
    set COR_ENABLE_PROFILING=1
  3. Run your program

Expected behavior Should see "before" and "after" in the console

Actual behavior

Problem signature: P1: 32Test.exe P2: 1.0.0.0 P3: a08db0d7 P4: clr.dll P5: 4.8.4121.0 P6: 5deace54 P7: c0000005 P8: 00017ff4 P9: P10:


**Runtime environment (please complete the following information):**
- Instrumentation mode: MSI
- Tracer version: 1.13.2
- OS: Windows 10 1903; Windows 2016
- CLR: .NET Framework 4.8

**Additional context**

This statement is used by the NServiceBus SQL Server transport which we use extensively.

I enabled trace debuging which yielded the following:

[32Test.exe] 6352: [debug] JITCompilationStarted skipping function call: argument counts don't match. function_id=41960924 token=100663298 target_name=System.Data.Common.DbCommand.ExecuteNonQueryAsync() expected_number_args=1 target_arg_count=0 [32Test.exe] 6352: [info] *** JITCompilationStarted() replaced calls from System.Data.Common.DbCommand.ExecuteNonQueryAsync() to System.Data.Common.DbCommand.ExecuteNonQueryAsync() 100675060 with calls to Datadog.Trace.ClrProfiler.Integrations.AdoNet.DbCommandIntegration.ExecuteNonQueryAsync() 167775165

darrenkopp commented 4 years ago

I believe that this is the same underlying issue that I was about to open... so I'll just comment here for now. I know that the issue that's affecting me was introduced in v1.8 as I can run v1.7 without issue. I believe that the issue was introduced in #516 though it might be just async methods that are affected.

I have the exact same setup (.NET Framework 4.8, running as a 32-bit process), but have't tried running as 64-bit. However, we have a lot of ADO.NET code running and the processes that crash consistently crash. But there's plenty of processes that don't crash at all.

In my crash dump I was able to get down to this stack trace in windbg:

3142ce90 1a37d831 (MethodDesc 2c11fb00 +0x11 Datadog.Trace.ClrProfiler.Integrations.AdoNet.DbCommandIntegration.ExecuteScalarAsync(System.Object, System.Object, Int32, Int32, Int64)), calling clr!JIT_IsInstanceOfClass
3142cea0 1a4efa2a (MethodDesc 1a7d1984 +0x22a Dapper.SqlMapper+<ExecuteScalarImplAsync>d__69`1[[System.Int32, mscorlib]].MoveNext()), calling (MethodDesc 2c11fb00 +0 Datadog.Trace.ClrProfiler.Integrations.AdoNet.DbCommandIntegration.ExecuteScalarAsync(System.Object, System.Object, Int32, Int32, Int64))
3142cf0c 1a4ef7b9 (MethodDesc 1a7d1d38 +0x39 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Int32, mscorlib]].Start[[Dapper.SqlMapper+<ExecuteScalarImplAsync>d__69`1[[System.Int32, mscorlib]], Dapper]](<ExecuteScalarImplAsync>d__69`1<Int32> ByRef)), calling (MethodDesc 1a7d1984 +0 Dapper.SqlMapper+<ExecuteScalarImplAsync>d__69`1[[System.Int32, mscorlib]].MoveNext())
3142cf60 1a4ef75e (MethodDesc 1a7d15f4 +0x5e Dapper.SqlMapper.ExecuteScalarImplAsync[[System.Int32, mscorlib]](System.Data.IDbConnection, Dapper.CommandDefinition)), calling (MethodDesc 1a7d1d38 +0 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Int32, mscorlib]].Start[[Dapper.SqlMapper+<ExecuteScalarImplAsync>d__69`1[[System.Int32, mscorlib]], Dapper]](<ExecuteScalarImplAsync>d__69`1<Int32> ByRef))
3142cfcc 1a4ef6e6 (MethodDesc 1a7d1148 +0x26 Dapper.SqlMapper.ExecuteScalarAsync[[System.Int32, mscorlib]](System.Data.IDbConnection, Dapper.CommandDefinition)), calling (MethodDesc 1a7d15f4 +0 Dapper.SqlMapper.ExecuteScalarImplAsync[[System.Int32, mscorlib]](System.Data.IDbConnection, Dapper.CommandDefinition))

Exception Analysis

BUGCHECK_STR:  APPLICATION_FAULT_NULL_CLASS_PTR_READ_NULL_CLASS_PTR_DEREFERENCE_INVALID_POINTER_READ

PRIMARY_PROBLEM_CLASS:  APPLICATION_FAULT

PROBLEM_CLASSES: 

    ID:     [0n313]
    Type:   [@ACCESS_VIOLATION]
    Class:  Addendum
    Scope:  BUCKET_ID
    Name:   Omit
    Data:   Omit
    PID:    [Unspecified]
    TID:    [0x188c]
    Frame:  [0] : clr!JIT_IsInstanceOfClass

    ID:     [0n285]
    Type:   [INVALID_POINTER_READ]
    Class:  Primary
    Scope:  BUCKET_ID
    Name:   Add
    Data:   Omit
    PID:    [Unspecified]
    TID:    [0x188c]
    Frame:  [0] : clr!JIT_IsInstanceOfClass

    ID:     [0n309]
    Type:   [NULL_CLASS_PTR_DEREFERENCE]
    Class:  Primary
    Scope:  BUCKET_ID
    Name:   Add
    Data:   Omit
    PID:    [Unspecified]
    TID:    [0x188c]
    Frame:  [0] : clr!JIT_IsInstanceOfClass

    ID:     [0n310]
    Type:   [NULL_CLASS_PTR_READ]
    Class:  Primary
    Scope:  DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
            BUCKET_ID
    Name:   Add
    Data:   Omit
    PID:    [0x222c]
    TID:    [0x188c]
    Frame:  [0] : clr!JIT_IsInstanceOfClass

I would assume that the DbCommandIntegration would need to do the same as the SqlCommandIntegration and do the following... but I feel like if that were the case this would fail constantly all the time.

                var targetType = command.GetInstrumentedType(SqlCommandTypeName);

                instrumentedMethod =
                    MethodBuilder<Func<DbCommand, CancellationToken, Task<object>>>
                       .Start(moduleVersionPtr, mdToken, opCode, AdoNetConstants.MethodNames.ExecuteScalarAsync)
                       .WithConcreteType(targetType)
                       .WithParameters(cancellationToken)
                       .WithNamespaceAndNameFilters(ClrNames.GenericTask, ClrNames.CancellationToken)
                       .Build();
lucaspimentel commented 4 years ago

Thanks both of you for reporting this! This is a high priority issue that we will look into in the next few days. Expect an update soon.

darrenkopp commented 4 years ago

Thanks @lucaspimentel. Let me know if you need anything else.

JarrodJ83 commented 4 years ago

@lucaspimentel is there any update on this? We are in the middle of trying to rollout DataDog and just installed the tracer in our production environment and it took down several applications (didn't see any issue in our lower environments for some reason). We had <add key="DD_TRACE_ENABLED" value="false" /> set in all apps but one hoping that the tracer would only affect the one app we had it turned on for. Unfortunately the apps that ended up crashing were ones that had tracing turned off. This is going to be a blocker to implmenting DataDog for us.

zacharycmontoya commented 4 years ago

Hi all, I wanted to provide an update on this issue. I've started debugging this and I've discovered that the issue with the clr!JIT_IsInstanceOfClass occurs on the CancellationToken argument. When the integration method is reached, that argument is corrupted and even VS shows that it cannot access the memory for that variable. Strangely, this issue with the CancellationToken only seems to affect this particular integration and not others like the HttpMessageHandler integration. I don't have a concrete fix yet, but I'm looking into the CLR profiler that does the redirection. I'll let you know when I have more updates.

zacharycmontoya commented 4 years ago

Hi everyone, I prepared and merged a fix into master that makes sure that the System.Threading.CancellationToken is passed correctly to our integration methods, and I've added regression tests for this particular callsite into our CI runs. This fix will be included in the next release, which we expect to happen later this week. Thank you again for your helpful reproductions that helped isolate the issue!

NickAdolf commented 4 years ago

Any update on the release timeline?

zacharycmontoya commented 4 years ago

Hi @NickAdolf, we just released .NET Tracer 1.16.1 that contains the fix for this issue. I'm going to close this issue now