DataDog / dd-trace-rb

Datadog Tracing Ruby Client
https://docs.datadoghq.com/tracing/
Other
304 stars 373 forks source link

fix: ensure the correct ids are serialized and propagated #3709

Closed mabdinur closed 3 months ago

mabdinur commented 3 months ago

What does this PR do?

Motivation:

Additional Notes:

How to test the change?

Unsure? Have a question? Request a review!

mabdinur commented 3 months ago

Profiling tests are failing due to 64bit unsigned span_ids being converted to signed integers.

@marcotc Is this a bug in the profiler?

marcotc commented 3 months ago

Profiling tests are failing due to 64bit unsigned span_ids being converted to signed integers.

I don't see any changes in your PR that should affect it, can you figure out what line exactly in your PR is triggering the issue?

mabdinur commented 3 months ago

Profiling tests are failing due to 64bit unsigned span_ids being converted to signed integers.

I don't see any changes in your PR that should affect it, can you figure out what line exactly in your PR is triggering the issue?

In the failing test the span_id of 13756630617282998475 is encoded as -4690113456426553141. Both numbers have the same 2s complement representation 1011111011101001010111101010100100100001011010100111000011001011.

It seems like the profiler is using signed 64bit integers to correlate spans to threads. However I can't find where this occurs. Looking at the thread collector's implementation it seems like the span_id should be an unsigned integer: https://github.com/DataDog/dd-trace-rb/blob/master/ext/datadog_profiling_native_extension/collectors_thread_context.c#L158. The ruby otel tracer generates span_ids up to 2^64 while the datadog ruby tracer generates span ids up to 2^63 so this wasn't an issue in the past.

cc: @ivoanjo

Update

I think I found the issue. LabelValue.num stores integers as i64. This is where the unsigned 64bit span_ids are miscast to signed integers: https://github.com/DataDog/dd-trace-rb/blob/5a95611148ceb8744c9692c6bc3f57e66094a970/ext/datadog_profiling_native_extension/collectors_thread_context.c#L741-#L742.

ivoanjo commented 3 months ago

Taking a look, thanks for the ping!

ivoanjo commented 3 months ago

I've pushed a fix in a99f0e51af09d45e57a851640c7407fdb2aa7f80 . Sorry for breaking your PR! I remember thinking "I should correct for the unsigned thing in our tests..." when working on #3466 but clearly ended up not doing it.

As I stated in the commit message -- both the profiler code and the backend know to treat these values as unsigned, and it was really only the test code that also decodes profiles to match on them that wasn't doing the same adjustment.

LMK if I can provide any more help :)

mabdinur commented 3 months ago

I've pushed a fix in a99f0e5 . Sorry for breaking your PR! I remember thinking "I should correct for the unsigned thing in our tests..." when working on #3466 but clearly ended up not doing it.

As I stated in the commit message -- both the profiler code and the backend know to treat these values as unsigned, and it was really only the test code that also decodes profiles to match on them that wasn't doing the same adjustment.

LMK if I can provide any more help :)

It's good to know span_ids are represented as unsigned integers in the backend. This fix looks good to me. Thanks for looking into this.

codecov-commenter commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 98.11%. Comparing base (e086475) to head (a99f0e5). Report is 21 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #3709 +/- ## ======================================= Coverage 98.11% 98.11% ======================================= Files 1225 1225 Lines 72795 72815 +20 Branches 3482 3486 +4 ======================================= + Hits 71420 71441 +21 + Misses 1375 1374 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mabdinur commented 3 months ago

@marcotc This PR should be good to go