aws-powertools / powertools-lambda-python

A developer toolkit to implement Serverless best practices and increase developer velocity.
https://docs.powertools.aws.dev/lambda/python/latest/
MIT No Attribution
2.83k stars 391 forks source link

Feature request: Add support for default dimensions on ColdStart metric captured with log_metrics decorator #5237

Open angelcabo opened 4 days ago

angelcabo commented 4 days ago

Use case

In my environment, I have two serverless applications: one built using Python and the other using TypeScript/Node.js. Both utilize AWS Powertools. However, I have noticed a discrepancy between the two in how they handle CloudWatch cold start metrics by default.

In the Node.js Powertools, the middleware automatically includes default dimensions such as service name, function name, and additional custom dimensions.

    .use(logMetrics(metrics, {
        throwOnEmptyMetrics: false,
        captureColdStartMetric: true,
        defaultDimensions: {'Environment': process.env.ENV!}
    }))

On the other hand, in the Python Powertools, the cold start metric only includes the function and service name by default, with no apparent way to extend or override this behavior using custom dimensions without completely managing the cold start metric manually.

@metrics.log_metrics(capture_cold_start_metric=True, default_dimensions={"Environment": ENVIRONMENT})
def lambda_handler(event: dict, context: LambdaContext):
...

This creates an inconsistency between the two libraries and forces us to write additional boilerplate code in Python if we want additional dimensions to be recorded along with the ColdStart metric, while the Node.js implementation provides a more seamless experience.

FWIW:

The docs are accurate in both cases: Python's page says:

If it's a cold start invocation, this feature will:

Create a separate EMF blob solely containing a metric named ColdStart
Add function_name and service dimensions

and TypeScript's page says:

If it's a cold start invocation, this feature will:

Create a separate EMF blob solely containing a metric named ColdStart
Add the function_name, service and default dimensions

Solution/User Experience

Ideally, the default_dimensions provided to the log_metrics decorator would be used when recording the cold start metric in Python.

Alternative solutions

As mentioned, an alternative is to continue to have users manually handle the cold start metrics themselves to support adding custom dimensions to the ColdStart metric in Python.

Acknowledgment

boring-cyborg[bot] commented 4 days ago

Thanks for opening your first issue here! We'll come back to you as soon as we can. In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

dreamorosi commented 3 days ago

Hi, thank you for opening this issue.

Just for added context, I have gone ahead and created a sample project to help with contextualizing the request and better see the differences between the two versions.

Given these two functions:

TypeScript:

```ts import { Metrics } from '@aws-lambda-powertools/metrics'; import { logMetrics } from '@aws-lambda-powertools/metrics/middleware'; import middy from '@middy/core'; const metrics = new Metrics({ namespace: 'MyApp', }); export const handler = middy(async () => { return { statusCode: 200, body: JSON.stringify('Hello, World!'), }; }).use( logMetrics(metrics, { throwOnEmptyMetrics: false, captureColdStartMetric: true, defaultDimensions: { Environment: 'prod' }, }) ); ```

Python:

```py from aws_lambda_powertools import Metrics from aws_lambda_powertools.metrics import MetricUnit from aws_lambda_powertools.utilities.typing import LambdaContext metrics = Metrics(namespace="MyApp") @metrics.log_metrics(raise_on_empty_metrics=False, capture_cold_start_metric=True, default_dimensions={"Environment": "prod"}) def lambda_handler(event: dict, context: LambdaContext): return {"statusCode": 200, "body": "Hello World"} ```

The EMF blog emitted by the two is similar but different in three aspects:

Below the full metric logs:

TypeScript:

{
  "_aws": {
    "Timestamp": 1727339583244,
    "CloudWatchMetrics": [
      {
        "Namespace": "MyApp",
        "Dimensions": [
          [
            "service",
            "Environment",
            "function_name"
          ]
        ],
        "Metrics": [
          {
            "Name": "ColdStart",
            "Unit": "Count"
          }
        ]
      }
    ]
  },
  "service": "service_undefined",
  "Environment": "prod",
  "function_name": "DimensionsFn",
  "ColdStart": 1
}

Python:

{
  "_aws": {
    "Timestamp": 1727339552523,
    "CloudWatchMetrics": [
      {
        "Namespace": "MyApp",
        "Dimensions": [
          [
            "function_name"
          ]
        ],
        "Metrics": [
          {
            "Name": "ColdStart",
            "Unit": "Count"
          }
        ]
      }
    ]
  },
  "function_name": "DimensionsFnPython",
  "ColdStart": [
    1.0
  ]
}

For the purpose of this issue, I think we are focusing on the first bullet point, the default dimensions being included in the cold start metric.

I see value in aligning the behavior between the two versions, although I don't have a strong opinion on which one should be the preferred one.

What I can say for sure is that making this change in either of the two libraries would be a breaking change, so we'll need to address this in a backwards compatible way if we decide to do so.

leandrodamascena commented 3 days ago

Hi, thank you for opening this issue.

Just for added context, I have gone ahead and created a sample project to help with contextualizing the request and better see the differences between the two versions. You can find the repo, which you can deploy using CDK, here:

Hi @dreamorosi, thank you very much for this investigation using both runtimes! This will be very useful in helping us make a decision. I have a few considerations below.

The EMF blog emitted by the two is similar but different in three aspects:

  • The EMF metric emitted by TypeScript includes default dimensions (in this case Environment), and Python doesn't

Yes, we don't include the default dimensions in Python, which I think is wrong behavior because we should consider the default dimension in all metrics, including ColdStart. In this case, TypeScript is doing the right thing and Python is missing this information.

  • The EMF metric emitted by TypeScript includes the service dimension by default, and Python doesn't

We also include the service dimension in Python too. In this case, you're not seeing it because you didn't define the service using the parameter constructor or the env variable.

  • The unit used for ColdStart in TypeScript is an integer (1) while in Python is a float (1.0) which is wrapped in an array/list

In this case, there is no difference in using an int/float or a list with int/float. We do this to try to optimize as much as possible the blob that we emit to EMF, so in cases where you have two metrics, we define both in the Metrics key and pass the values ​​in that array. The result/data in CloudWatch will be the same.

I see value in aligning the behavior between the two versions, although I don't have a strong opinion on which one should be the preferred one.

I also see value in aligning behavior, and I think including default dimensions in the ColdStart metric makes more sense because as a customer I would expect all my metrics —no matter which ones— to inherit default dimensions if I set them intentionally.

What I can say for sure is that making this change in either of the two libraries would be a breaking change, so we'll need to address this in a backwards compatible way if we decide to do so.

This is the most problematic point in this discussion. If we decide to include default dimensions in the ColdStart metric, we cannot make this the default behavior. This is a potential breaking change for customers who have dashboards looking at ColdStart without dimensions. CloudWatch costs can also be an issue for customers because each combination of metric_name x namespace x dimension added to CloudWatch is billed as a separate metric, and customers may experience increased billing if we make this the default behavior. And many other situations that I can't predict all of them.

I was thinking about creating a new parameter in @metrics.log_metrics() to allow customer to choose if they want to include the default dimensions. And in Powertools V4 we make this the default behavior. The experience could be something like this:

@metrics.log_metrics(capture_cold_start_metric=True, include_default_dimensions_cold_start=True)
def lambda_handler(event: dict, context: LambdaContext):

I don't like the name of this parameter, but it's just for illustration.

Please let me know what do you think @angelcabo and @dreamorosi.

dreamorosi commented 3 days ago

Thanks for clarifying the int/float difference, for the service name I think the Python behavior is better for cost savings, but I can see why it was implemented this way in TS - anyway for now onwards I'll focus the discussion only on the default dimensions being included in the cold start metric.

Seeing how we all agree that default dimensions should be included in that metric, I think we have two options forward. One is the one you suggested, making the behavior opt-in in the current major version and eventually default in the next one.

The other option would be to consider this a bug and fix it right away. I acknowledge that the line here is quite blurry, however while the log_metrics decorator doesn't explicitly mention the default dimensions being included, the description of the default_dimensions in the API reference reads this: image

With this in mind, I think there's room for interpret them being excluded as an unintended behavior and thus treat this as a bug fix. There are also precedents for this type of change being treated as bugs and having been fixed within minor releases.

Either way, I think there's a potential for data loss. Customers might be trying to set up dashboards that query the cold start metric with certain dimensions expecting to find values because they set default_dimensions and they don't find them.


If we go to the route you suggested, I'd like us to consider using an environment variable instead of adding another parameter. I think in the long term this would make the v3 to v4 switch easier, and the API less verbose in the code.

For example, having an env variable like POWERTOOLS_METRICS_DEFAULT_DIMENSIONS_IN_COLDSTART_METRIC that defaults to False when not present.