alexcasalboni / aws-lambda-power-tuning

AWS Lambda Power Tuning is an open-source tool that can help you visualize and fine-tune the memory/power configuration of Lambda functions. It runs in your own AWS account - powered by AWS Step Functions - and it supports three optimization strategies: cost, speed, and balanced.
Apache License 2.0
5.27k stars 362 forks source link

Log analysis uses Billed Duration rather than Duration for calculating Lambda run time #228

Closed ryancormack closed 4 months ago

ryancormack commented 5 months ago

With Lambda's billing model rounding up to the nearest millisecond, this can would ignore fractional milliseconds from being calculated on the average length of time the lambda runs for. It's unlikely this is an issue for the vast vast majority of use cases (if you are sensitive to fractional millisecond latency drifts Lambda is unlikely the best tool), however it does result in any cold start not being calculated properly. When using Managed Runtimes you are not billed for the init duration of a function. In these circumstances, depending on the number of invocations and whether they are run in parallel of not, it could make a difference to what the end user experiences from your Lambda Function. When using a Custom Runtime, or using a Container Image package type, you are billed for the init duration, so the power tuner does show a more representative summary of end user experience on the Tuner.

However, the Power Tuner does also display cost/price compared to performance. For these metrics, using the Billed Duration rather than the Duration metric is the correct value to use, which highlights the cost benefit of using a Managed Runtime.

I couldn't find any historical issues related to this, so I suspect it's not been a problem. However, given there are two open PRs (here and here) and an long running issue for Power Tuning cold starts only, I think this will become problematic. I could take 2 identical functions, but have one run on a Node 20 custom runtime and one on a Node 20 Managed Runtime. The end user experience would be very similar (getting loads of slow/cold starts), but the power tuner would report the managed runtime returns in double digit milliseconds and the custom one returns in 600+ms.

I see there are already parameter options to discard outliers. Should it even be considered that this is an existing bug, or is this behaviour what is expected?

Should the extraction logic just be updated to know if it's a custom or managed runtime and pick the appropriate part of the Report log detail, likely having to use different values for billed and time metrics.

Examples: Managed Runtime Cold Start: Duration: 25.30 ms Billed Duration: 26 ms Memory Size: 1024 MB Max Memory Used: 79 MB Init Duration: 400.62 ms

Custom Runtime Cold Start: Duration: 21.50 ms Billed Duration: 399 ms Memory Size: 1024 MB Max Memory Used: 86 MB Init Duration: 376.86 ms

In this example the Power Tuner would report something like this. I think this is very misleading when considering only cold starts and as close to 100% accurate as necessary for only warm starts. The cost part is correct, but the time part I think is wrong - but that's subjective depending on what the real goal of the time metric here is.

alexcasalboni commented 4 months ago

Hi @ryancormack thanks for sharing 🙏 and apologies for the delayed response :)

You can find a related discussion from last year here: https://github.com/alexcasalboni/aws-lambda-power-tuning/issues/197

Until about a year ago, the tool used Duration to compute everything. There are a few cases where it really makes sense to use Billed Duration instead of Duration (especially if parallelInvocation=true, discardTopBottom=0, and your init time is considerable and not free). So I don't think we'll go back to using Duration. But we could take into consideration Init Duration, which we're currently ignoring.

When power-tuning one function at a time, the effect is very minimal since (generally speaking) the % of cold starts is small and it doesn't change across power values, so the results you see are useful to pick the best power configuration.

That said, I agree that when using the tool to compare similar functions (but with a different runtime or cold start) the results could be misleading. The two PRs you mentioned are still WIP and I believe will need to take Init Duration into consideration to provide accurate and relevant results.

Does that make sense? :)

ryancormack commented 4 months ago

Hi @alexcasalboni, all makes sense and I totally agree. I had actually modified the code to add the Duration and Init Duration together to get a more reliable e2e latency for custom and managed runtime comparison (but that's not going to consider the cost implications). 👍🏻