DataDog / dd-trace-dotnet

.NET Client Library for Datadog APM
https://docs.datadoghq.com/tracing/
Apache License 2.0
456 stars 142 forks source link

Fix dlsym issue (#6048 => hotfix) #6049

Closed andrewlock closed 2 months ago

andrewlock commented 2 months ago

Summary of changes

This PR addresses the issue https://github.com/DataDog/dd-trace-dotnet/issues/6045

Reason for change

When using the dlsym function, the compiler adds in the import symbols table that we need the dlsym symbol. Before being a universal binary (same binary used for glibc-based linux and musl-libc-based linux) and the compiler added in a DT_NEEDED section the library libdl.so (the library containing dlsym). When the wrapper is loaded, it will look through all the DT_NEEDED sections to find a library that contains the dlsym symbol. Since being a universal binary, the DT_NEEDED sections are removed (part of being universal) and we have to resolve by hand needed symbols (dlsym, pthread_once ..). If we use dlsym (or other symbol), we will hit this issue.

Implementation details

Test coverage

Added a snapshot test using nm that verifies that the undefined symbols in the universal binary haven't changed. It's equivalent to running

nm -D Datadog.Linux.ApiWrapper.x64.so | grep ' U ' | awk '{print $2}' | sed 's/@.*//' | sort

but done using Nuke instead. It would probably make sense for this to be a "normal" test in the native tests, but given it has a dependency on nm, which is definitely available in the universal build dockerfile it was quicker and easier to get this up and running directly. When it fails, it prints the diff and throws an exception, e.g.

System.Exception: Found differences in undefined symbols (dlsym) in the Native Wrapper library. Verify that these changes are expected, and will not cause problems. Removing symbols is generally a safe operation, but adding them could cause crashes. If the new symbols are safe to add, update the snapshot file at C:\repos\dd-trace-dotnet\tracer\test\snapshots\native-wrapper-symbols-x64.verified.txt with the new values

Other details

This is a hotfix for

datadog-ddstaging[bot] commented 2 months ago

Datadog Report

Branch report: andrew/dlsym-hotfix-v3 Commit report: 4fbd8a5 Test service: dd-trace-dotnet

:white_check_mark: 0 Failed, 368261 Passed, 2368 Skipped, 16h 33m 50.74s Total Time :hourglass: 1 Performance Regression

:hourglass: Performance Regressions vs Default Branch (1)

andrewlock commented 2 months ago

Execution-Time Benchmarks Report :stopwatch:

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (70ms)  : 66, 74
     .   : milestone, 70,
    master - mean (70ms)  : 66, 74
     .   : milestone, 70,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (1,110ms)  : 1081, 1139
     .   : milestone, 1110,
    master - mean (1,119ms)  : 1092, 1145
     .   : milestone, 1119,
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (109ms)  : 105, 112
     .   : milestone, 109,
    master - mean (108ms)  : 105, 112
     .   : milestone, 108,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (798ms)  : 776, 821
     .   : milestone, 798,
    master - mean (809ms)  : 792, 827
     .   : milestone, 809,
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (92ms)  : 89, 96
     .   : milestone, 92,
    master - mean (91ms)  : 89, 94
     .   : milestone, 91,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (752ms)  : 725, 779
     .   : milestone, 752,
    master - mean (761ms)  : 739, 783
     .   : milestone, 761,
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (192ms)  : 188, 195
     .   : milestone, 192,
    master - mean (192ms)  : 186, 198
     .   : milestone, 192,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (1,199ms)  : 1174, 1224
     .   : milestone, 1199,
    master - mean (1,201ms)  : 1175, 1228
     .   : milestone, 1201,
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (277ms)  : 274, 281
     .   : milestone, 277,
    master - mean (278ms)  : 273, 282
     .   : milestone, 278,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (966ms)  : 942, 990
     .   : milestone, 966,
    master - mean (968ms)  : 947, 988
     .   : milestone, 968,
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (266ms)  : 262, 270
     .   : milestone, 266,
    master - mean (266ms)  : 261, 271
     .   : milestone, 266,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (946ms)  : 922, 970
     .   : milestone, 946,
    master - mean (945ms)  : 917, 973
     .   : milestone, 945,