aws-observability / aws-otel-js

AWS Distro for OpenTelemetry JavaScript SDK
https://aws-otel.github.io/
Apache License 2.0
36 stars 22 forks source link

Trace state not set in AWSXRayPropagator #51

Closed jacogreyling closed 3 years ago

jacogreyling commented 3 years ago

Hi,

I found an issue where the parentId of a distributed span is not set when using the AWSXRayPropagator propagator. When a span is initialised via the AWS API Gateway, the header is properly passed to my Node.JS application however when I look in X-Ray the spans are not 'linked'. I can see both the API Gateway span as well as the Node.JS span when I search for the two trace id's but they're not linked.

I've enabled ConsoleSpanExporter() to see what's going on. When I use the 'out-of-the-box' propagator I can see this for example:

  traceId: 'e5625bcfc8a54afc445a5783c9ce827a',
  parentId: '420e23fec3fdd1a3',
  name: 'HTTP POST',
  id: '75f4d6fae0e11cc2',
  kind: 1,
  timestamp: 1614690886046893,
  duration: 39607,
  attributes: {
    'http.url': 'http://localhost:3000/check',
    'http.host': 'localhost:3000',
    'net.host.name': 'localhost',
    'http.method': 'POST',
    'http.route': '/check',
    'http.target': '/check',
    'http.user_agent': 'PostmanRuntime/7.6.0',
    'http.request_content_length_uncompressed': 94,
    'http.flavor': '1.1',
    'net.transport': 'IP.TCP',
    'net.host.ip': '127.0.0.1',
    'net.host.port': 3000,
    'net.peer.ip': '127.0.0.1',
    'net.peer.port': 55476,
    'http.status_code': 200,
    'http.status_text': 'OK'
  },
  status: { code: 1 },
  events: []
}

However when I use AWSXRayPropagator, I get:

  traceId: '603e3ac6cb28214c0ff4ce9a9b7c6ebf',
  parentId: undefined,
  name: 'HTTP POST',
  id: 'e4c30d5d89963972',
  kind: 1,
  timestamp: 1614691014217177,
  duration: 39954,
  attributes: {
    'http.url': 'http://localhost:3000/check',
    'http.host': 'localhost:3000',
    'net.host.name': 'localhost',
    'http.method': 'POST',
    'http.route': '/check',
    'http.target': '/check',
    'http.user_agent': 'PostmanRuntime/7.6.0',
    'http.request_content_length_uncompressed': 94,
    'http.flavor': '1.1',
    'net.transport': 'IP.TCP',
    'net.host.ip': '127.0.0.1',
    'net.host.port': 3000,
    'net.peer.ip': '127.0.0.1',
    'net.peer.port': 55602,
    'http.status_code': 200,
    'http.status_text': 'OK'
  },
  status: { code: 1 },
  events: []
}

After tracing the code I realised the following, the AWSXRayPropagator does not set the traceState in the context.

Comparing the following call in instrumentation-http: api_1.propagation.extract(api_1.ROOT_CONTEXT, headers)

I get this object with the out-of-the-box propagator:

Symbol(OpenTelemetry Context Key SPAN)
{
    "traceId": "e5625bcfc8a54afc445a5783c9ce827a",
    "spanId": "420e23fec3fdd1a3",
    "traceFlags": 1,
    "isRemote": "true",
    "traceState": "TraceState"
}

where as with AWSXRayPropagator call I get:

Symbol(OpenTelemetry Context Key ACTIVE_SPAN)
{
    "traceId": "60389b7527eb080c7eb3ca5c3e69ce27",
    "spanId": "06a96d45a27c25e7",
    "traceFlags": 1,
    "isRemote": "true"
}

Now I'm not sure if it's the root cause of my problem but it's definitely causing the console span exporter to report no parent and I suspect this is also why X-Ray is not stitching them back together again. I can see all my spans going to my aws-otel-collector.

Any thoughts? Happy to provide more traces if required?

anuraaga commented 3 years ago

Hi @jacogreyling - currently AwsXrayPropagator doesn't support tracestate, but that shouldn't be affecting the parenting, tracestate is a different feature.

In your last snippet, it looks like the AWSXrayPropagator is actually extracting the trace ID / span ID from API gateway - I guess that span ID corresponds to the API gateway span you see in X-Ray? If so it does seem like a bug would be preventing this from actually being set to the parent of the span. This repo is currently being updated to the latest vesion of otel-js (finally) and I think we'll need to verify this again on the latest version. @willarmiros heads up.

jacogreyling commented 3 years ago

Hi @anuraaga thank you for the response. I'm happy to test the latest version to see if it's still the same? Should I grab the latest src from the master branch or are you guys working on a different branch?

Indeed the AWSXrayPropagator is correctly extracting the trace id out of the header as you can see from this screenshot:

image

This context is passed in the request header:

x-amzn-trace-id: Root=1-60389b75-27eb080c7eb3ca5c3e69ce27;Parent=06a96d45a27c25e7;Sampled=1

Yeah the only two differences I can see is the omission of the tracestate and the opentelemetry context key is set to ACTIVE_SPAN vs. SPAN. I can drill deeper to see exactly the line where the parent id is skipped but if you're working on a newer version then probably best to work off that base.

anuraaga commented 3 years ago

Thanks @jacogreyling - I think we'd need to rewrite the propagator itself to reflect the newer API upstream and @willarmiros should have something in progress for that. Let's check this behavior after that gets in

lupengamzn commented 3 years ago

@anuraaga Agreed

@jacogreyling Can you test it using the latest version?

jacogreyling commented 3 years ago

Awesome it worked! I can confirm that commit solved this issue for me.

{
  traceId: '60389b7527eb080c7eb3ca5c3e69ce27',
  parentId: '06a96d45a27c25e7',
  name: 'HTTP POST',
  id: 'c39083ba2a5a27d9',
  kind: 1,
  timestamp: 1615103936906815,
  duration: 24758,
  attributes: {
    'http.url': 'http://localhost:3000/check',
    'http.host': 'localhost:3000',
    'net.host.name': 'localhost',
    'http.method': 'POST',
    'http.route': '/check',
    'http.target': '/check',
    'http.user_agent': 'PostmanRuntime/7.6.0',
    'http.request_content_length_uncompressed': 94,
    'http.flavor': '1.1',
    'net.transport': 'IP.TCP',
    'net.host.ip': '127.0.0.1',
    'net.host.port': 3000,
    'net.peer.ip': '127.0.0.1',
    'net.peer.port': 64763,
    'http.status_code': 200,
    'http.status_text': 'OK'
  },
  status: { code: 1 },
  events: []
}

From X-Ray:

Screenshot 2021-03-07 at 10 52 06

Question: when will this update be released to the npm registry? At the moment I'm packaging this update directly in my code which is obviously not ideal.