aws / aws-xray-sdk-go

AWS X-Ray SDK for the Go programming language.
Apache License 2.0
276 stars 117 forks source link

Consider not recommending AWS Distro for OpenTelemetry (ADOT)? #372

Closed a-h closed 1 year ago

a-h commented 2 years ago

I tried out migrating, and the Lambda layer appears to add a significant amount to cold start time. Of course, it probably only matters to people that use Lambda, but that's a lot of people.

I don't see any docs on the potential downsides of migrating. Maybe it's a bit too early to be recommending it?

There was no documentation on migrating, but here's the commit where I migrated from AWS X-Ray SDK: https://github.com/a-h/awsapigatewayv2handler/commit/c45b98eb1b9ec8ad6618b7b38b0347ef4bec963c

Here's the issue in AWS Otel Lambda, with my comments about the Go behaviour: https://github.com/aws-observability/aws-otel-lambda/issues/228#issuecomment-1193215390

Here's the test data, for reference here.

Screenshot 2022-07-24 at 01 19 18
willarmiros commented 2 years ago

Hi @a-h,

Thanks for raising this. We are tracking the cold-start issues and considering modifying the wording on our documentation, stay tuned!

Are all of the durations in your screenshot cold-start durations (both before and after OTel)? Do you have any warm invocations for comparison between before & after OTel?

a-h commented 2 years ago

Hi @willarmiros, yes, the screenshot above is just showing cold starts.

I've included warm invocations below.

I filtered out invocations of > 10 seconds because they were just tests I was doing of running the Otel collector without the Lambda layer. The docs don't make it clear whether the Lambda layer is required - but I found that it is - without the Lambda layer, the function hangs.

fields @timestamp, @billedDuration
| filter @message like /Billed Duration/
| filter @message not like /Init Duration/
| filter @billedDuration < 10000
| stats avg(@billedDuration) by bin(1d) as time
| sort by time
image

The drop of billed duration on the 17th corresponds with updating the Go build flags to set -tags lambda.norpc on the build flags.

The average billed duration increases from around 2ms to 13ms from the 23rd July, which is when I started replacing X-Ray with Otel.

image
willarmiros commented 1 year ago

Hi @a-h - we have updated the README to have a more neutral tone & be aligned with our other documentation, and have surfaced this latency issue in our developer guide as well. We are actively investigating the latency issue, and will post updates as we progress on it here: https://github.com/aws-observability/aws-otel-lambda/issues/228