aws-observability / aws-otel-community

Welcome to the AWS Distro for OpenTelemetry project. If you're using monitoring and observability tools for AWS products and services, this is a great place to ask questions, request features and network with other community members.
https://aws-otel.github.io/
Apache License 2.0
97 stars 96 forks source link

Document AWS Lambda compatibility #4

Closed joebowbeer closed 2 years ago

joebowbeer commented 4 years ago

Is it possible to use AWS Distro for OpenTelemetry to instrument AWS Lambda functions?

From aws-otel.github.io :

Use AWS Distro for OpenTelemetry to instrument your applications running on Amazon Elastic Compute Cloud (EC2), Amazon Elastic Container Service (ECS), and Amazon Elastic Kubernetes Service (EKS) on EC2, and AWS Fargate, as well as on- premises.

If so, I recommend updating the documentation.

awssandra commented 4 years ago

Hi joebowbeer,

It's on our roadmap to support AWS Lambda, but we do not support it today. Stay tuned for updates soon!

Meanwhile, see our public roadmap here: https://github.com/orgs/aws-observability/projects/4

mjpowersjr commented 3 years ago

Hello, I spent some time working through integrating Lambda <> OpenTelemetry in a NodeJS environment. Below are some of the areas that I think could see improvement:

Timing of OTEL auto-instrumentation.

We followed a similar approach to what is described in the guide Tracing with the AWS Distro for OpenTelemetry JavaScript SDK, although my setup file is named telemetry.js instead of tracing.js.

When working with simple examples, it's easy to add a line similar to the following at the top of your code. This allows opentelemetry to intercept calls to require in order to automatically instrument supported modules.

require('./telemetry.js');

Unfortunately this approach an fall apart pretty quickly if you add babel/webpack into the mix to minimize the size of your lambda functions. A recommended alternative is to launch your application with a node argument that allows telemetry.js to load before your main code executes. The command typically looks something like this:

node --require ./telemetry.js  src/index.js

This turned out to be mildly challenging in a Lambda environment. I was eventually able to launch our NodeJS lambda functions using this approach, but it involved:

telemetry-wrapper

#!/bin/bash

# the path to the interpreter and all of the originally intended arguments
args=("$@")

# the extra options to pass to the interpreter
extra_args=("--require" "/opt/telemetry.js")

# insert the extra options
args=("${args[@]:0:$#-1}" "${extra_args[@]}" "${args[@]: -1}")

# start the runtime with the extra options
exec "${args[@]}"

OTEL Exporters need to flush before shutting down

It's not clear to me if this is a responsibility for AWS or OTEL, but the OTEL exporters don't seem to have the oportuntiy to flush and pending spans before a lambda shuts down. I was able to sole this with a call await api.trace.getTracerProvider().getDelegate().shutdown();

While doing research to find a working solution to the timing issue, I found Sentry's approach to Lambda's pretty clean, hopefully this use case is something that OpenTelemetry gives some attention to in the future.

Connecting OTEL to the external X-Ray trace.

If you turn on X-Ray support for a Lambda, you automatically get high level instrumentation out-of-the-box. If you setup the aws-otel-collector and use otel libraries to export traces to X-Ray, you can get detailed instrumentation. If you have both X-Ray support enabled at the Lambda level, and your own otel instrumentation, this will result in two separate traces being generated for every invocation of the Lambda function. I managed to work around this by abusing the AWSXRayPropagator class. :-)

Recording exceptions does not seem to work

For some reason, recording exceptions does not seem to work as expected. I haven't had time to dig deeper into what's going on, only I don't see any indication of the error or stack trace in x-ray.

Other thoughts:

I found this discussion interesting, but I'm not sure if a custom tracer would have helped with the issues I ran into:

It would be nice if this opentelemetry plugin supported a Lambda environment:

Below is an example that demonstrates some of the topics discussed above:

import "core-js/stable";
import "regenerator-runtime/runtime";

const path = require('path');

const { NoRecordingSpan } = require('@opentelemetry/core');
const api = require('@opentelemetry/api');

const deployment = require('../package.json');
const axios = require('axios');

function instrumentHandler(handler) {
    return async (event, context, callback) => {
        const tracer = api.trace.getTracer(deployment.name, deployment.version);

        // Currently AWSXRayPropagator expected to be passed HTTP headers,
        // not a Lambda environment map.
        const mockHttpRequestHeaders = {
            'X-Amzn-Trace-Id': process.env._X_AMZN_TRACE_ID
        };

        // propagate remote AWS X-Ray span to current execution context
        await api.context.with(api.propagation.extract(mockHttpRequestHeaders), async () => {

            const remoteSpan = new NoRecordingSpan(api.context.active());

            const handlerName = path.basename(process.env._HANDLER)

            const handlerSpan = tracer.startSpan(handlerName, {
                parent: remoteSpan,
                kind: api.SpanKind.CONSUMER
            });

            let handlerReturn = null;
            let handlerError = null;
            try {
                handlerReturn = await tracer.withSpan(handlerSpan, async () => {
                    return handler(event, context, callback);
                });
            } catch (error) {
                handlerError = error;
                // FIXME: Recorded exceptions to not make it to x-ray, attached to the respective
                // span. By allowing the exception to bubble up, Lambda's x-ray integration
                // will ultimately record the exception at a higher level span.
                handlerSpan.recordException(error);
            }

            handlerSpan.end();

            // ensure exporter(s) have a chance to flush  spans before 
            // lambda fn shuts down or freezes
            await api.trace.getTracerProvider().getDelegate().shutdown();

            if (handlerError) {
                throw handlerError
            }

            return handlerReturn;

        });
    };
}

// eslint-disable-next-line no-unused-vars
export const handler = instrumentHandler(async (event, context, callback) => {
    console.log(process.env);

    const response = await axios.get('https://ifconfig.co/json')
    console.log(response.data);
});
mjpowersjr commented 3 years ago

After further investigation, it seems that the approach highlighted above likely does not handle the scenario in which Lambda's are frozen, and potentially never unfrozen. See the following open issue for details: https://github.com/open-telemetry/opentelemetry-js/issues/1739

joebowbeer commented 3 years ago

A external-type lambda extension is needed?

mjpowersjr commented 3 years ago

@joebowbeer - Possibly, it looks like some effort has been put towards using Lambda extensions for OTEL support. TBH I'm not familiar enough with Lambda's life-cycle or how the OTEL Collector works to know what additional challenges might be involved in the extension approach.

Maybe someone from one of the following projects can provide recommendations.

https://github.com/open-telemetry/opentelemetry-lambda-extension

https://github.com/aws-observability/aws-otel-lambda

alolita commented 2 years ago

Done.