Feature request: correlation ID's propagation

saragerion commented 3 years ago

Description of the feature request

Problem statement The use of correlation ID's can be extremely helpful for debugging purposes and help developers understanding the lifecycle of user transactions as they are being handled by different microservices within a platform. It would be good to help developers understand how they can use correlation ID's effectively and following the best practices, potentially allowing them to bring their own custom correlation ID's, and most importantly propagate correlation ID's through the different utilities to external dependencies of a service. This can be achieved through the implementation of a new dedicated utilities, new features within all utilities, adding examples and/or documentation.

For the scope of this Issue, we can identify 2 types of correlation ID's:

1) Unique transaction ID that are set by AWS (for example X-Ray Trace Id, AWS Request Id) 2) User-defined correlation ID's that can be stored and passed along between microservices:

SNS - Message attributes https://github.com/awslabs/aws-lambda-powertools-typescript/blob/main/tests/resources/events/aws/sns-notification.json#L18
CloudFront - Request headers https://github.com/awslabs/aws-lambda-powertools-typescript/blob/main/tests/resources/events/aws/cloudfront-modify-querystring.json#L13
API Gateway - Headers https://github.com/awslabs/aws-lambda-powertools-typescript/blob/main/tests/resources/events/aws/apigateway-aws-proxy.json#L21
Others...

Summary of the feature All utilities would be able to fetch out of the box correlation ID's coming from each different AWS service (Lambda Trigger), and propagate them to logs, metrics, traces, .... Note that this feature and functionality should not necessarily be enabled by default, but it should be possible to enable it and turn it on if developers need it. It should be also possible to define your own custom correlation ID's and be able to propagate them and use them in different utilities accordingly.

Implement this logic in all utilities. Research is needed to understand the best implementation strategy, how to not do code repetition.

Code examples

TBC. Happy to receive suggestions on this one.

Benefits for you and the wider AWS community As discussed in a past meeting with @gsingh1 @loujaybee and @simontabor, the functionality of fetching and propagating correlation ID's, including custom user-defined ones, can be useful and especially relevant for developers who operate at scale and within big organisations, where you have a high number of teams responsible for microservices communicating to each other.

Describe alternatives you've considered

None that comes into mind apart from writing the code by yourself.

Additional context See here a brief definition of a correlation ID.

Related issues, RFCs

Not at the moment.

dreamorosi commented 3 years ago

In the case of tracing, should this ID in the name of the segment or as an annotation of the segment?

saragerion commented 3 years ago

Thanks for asking @dreamorosi, I envision it/them as an annotation(s) of the segment

bahrmichael commented 3 years ago

My thoughts on the problem statement:

how they can use correlation ID's effectively and following the best practices

I don't think I'm familiar with the best practices yet. Will collect more info on the way. If you know any good links, please link them in this issue :)

potentially allowing them to bring their own custom correlation ID's

This sounds like we can start with an MVP where we generate our own correlation ID, and then expand to allow a custom ID.

achieved through the implementation of a new dedicated utilities

This sounds like we would have a package like powertools/correlation. When reading this issue I first though about some relation to the logger utility, as that's where I'd expect the correlation ID to be printed so that customers can use them. But tracing also makes sense, when you want to follow the path of a request. Metrics as well as you want to know if a particular request caused e.g. latency spikes. Looks like I'm back to agreeing that this requires a cross-utility approach.

bahrmichael commented 3 years ago

Updated the comment above, as it accidentally showed my comment as part of the quote.

bahrmichael commented 3 years ago

For metrics I think outputting the correlation ID into metadata makes sense.

For logs I'd follow the correlation example in additional-keys.ts.

saragerion commented 3 years ago

See here: https://github.com/getndazn/dazn-lambda-powertools#did-you-consider-monkey-patching-the-clients-instead https://github.com/getndazn/chaos-squirrel/blob/master/packages/attack-http-requests/src/index.ts#L25

bahrmichael commented 3 years ago

In our sync Lou raised the idea of monkey patching:

bahrmichael commented 3 years ago

Thought some more about a good approach, and here's what understanding I have of a well rounded approach. This might repeat some of the initial post from @saragerion.

Opt In

Correlation IDs should require opt in. As a customer I don't want the utility to just forward headers to other places. Instead I want to explicitly name correlation IDs, or allow a default set of correlation ID names.

Examples:

With a middle-ware setup, I can enable the default AWS correlation Ids: .use(enableCorrelationIds({awsDefaults: true}))
With a middle-ware setup, I can also pick my own: .use(enableCorrelationIds({customIds: ['X_CORRELATE_ID'] }))

As a result we would need a configurable function/constructor, which accepts awsDefaults: boolean and customIds: string[].

ID Population

The function code should be able to add correlation IDs at any time during the functions request handling. Correlation IDs change from request to request, but I think are not found in function initialization.

Examples:

Another service calls mine, with a X_CORRELATE_ID header which my function should forward.
My service initiates a request chain (e.g. after being called from a cron), and generates the first X_CORRELATE_ID which it should then pass to any other services.

To achieve this I think we need some memory storage that lives outside of the function calls. From Java I know Mapped Diagnostics Context, and I'm not sure if something similar exist in Node.

The NPM package correlation-id uses AsyncLocalStorage from node core utilities.

This class is used to create asynchronous state within callbacks and promise chains. It allows storing data throughout the lifetime of a web request or any other asynchronous duration. It is similar to thread-local storage in other languages.

This seems to be exactly what I'm looking for.

To let anyone populate correlation IDs, the utility should expose methods to manage correlation IDs. That way the middle-ware can add incoming correlation headers, logging can print correlation IDs, and customers can decide to clear correlation IDs if they wish so. I will look into the approaches of the Logging and Metrics utilities, and try to follow their existing management approaches. Maybe there will also be some synergies.

Why outside the function calls?

Correlation IDs are not relevant to a function invocation, but are just passed along on the side as helpful diagnostics information. They usually don't influence functions.

Injecting the Correlation ID

At any point in a request should we be able to use the correlation IDs, e.g. for printing logs, or forwarding them to other services.

Therefore the correlation ID utility should provide a way to retrieve all available correlation IDs, based on the initial config during middle-ware or annotation based setup.

There could be a method with the following signature, which allows retrieving all correlation IDs, or a subset:

function listCorrelationIds(names?: string[]): { [key: string]: string }[]

We can then use this function in logging, metrics, monkey-patching to add more information.

michaelbrewer commented 2 years ago

@saragerion can we implement something like this for typescript too? This is similar to how python allows for logging the event with the correlationId based a json path

import { Logger, CorrelationPaths } from "@aws-lambda-powertools/logger";
import { LambdaInterface } from '@aws-lambda-powertools/commons';

const logger = new Logger();

class Lambda implements LambdaInterface {
    @logger.injectLambdaContext(correlationIdPath=CorrelationPaths.API_GATEWAY_REST, logEvent=true)
    public async handler(_event: any, _context: any): Promise<void> {
        logger.info("This is an INFO log with some context");
    }
}

export const func = new Lambda();
export const handler = func.handler;

github-actions[bot] commented 1 year ago

This issue has not received a response in 2 weeks. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.

dreamorosi commented 1 year ago

Hi everyone, if you arrive on this issue because you are interested in this feature or because you are looking to migrate from dazn-lambda-powertools we would like to hear from you!

We are still considering this feature but we need more examples and to gather requirements to build a RFC.

If you are interested, please leave a comment describing briefly your use case and how the correlation ID feature would work in terms of experience. It's fine also to point towards existing resources.

If uncomfortable with sharing your use case publicly, you can also reach out to us privately at aws-lambda-powertools-feedback@amazon.com.

udomsak commented 1 year ago

@dreamorosi my use case is function are that work under AWS EventBridge scheduler would be great if I can set custom or user define in correlation id and pass by dynamodb.

However, If you have any idea to tracking transaction across functions that were trigger by EventBridge I welcome. thanks :)

alfaproject commented 1 year ago

Have you considered using the Baggage standard (or something close to it) for this implementation? https://www.w3.org/TR/baggage/

dreamorosi commented 2 months ago

Just wanted to give an update on this.

This issue is currently our oldest issue that is still have open.

While we acknowledge that over time it has garnered some demand, because of its scope being fairly broad it's a bit challenging for us to get a sense of which features within the many proposed above we should focus on.

While this is not an area that we are actively prioritizing for the remainder of this year, we would like to better understand which parts of the correlation ID piece from DAZN people are particularly missing in Powertools for AWS. This will help us to create our backlog for next year.

In the meantime, one feature that we can consider already today is allowing customers to set correlation IDs in Logger. This is a feature that Powertools for AWS Lambda (Python) already supports, so from a feature parity standpoint it still makes sense for us to add it sooner rather than later - you can track the progress around that effort in #2863.

I'll revisit this in a few months and see if anything has changed in terms of engagement with the issue.

aws-powertools / powertools-lambda-typescript