Closed alexcasalboni closed 2 months ago
This is somehow related to the (closed) issue #123.
@Parro what do you think of this approach?
Twitter thread with Paul Johnston for reference: https://twitter.com/alex_casalboni/status/1556585120332759040
I was thinking of a different approach: we could create a set of different versions of the lambda with the same code in the Initializer
step, this way every invocation of a lambda version should spawn a new process with its cold start. This way we could still use parallelization, and even add a new step in the state machine so that we can have the statistics of Duration
andInit Duration
together in the report.
What do you think about it?
@Parro yes, that's what Paul proposed too.
Let's double-check the Lambda quotas :)
Is there any limit regarding the # of aliases per function? Or any API rate-limiting when creating new versions and aliases? I never encountered any limitations since we only create one version/alias per power value.
Let's assume there are no such limitations.
With x
power values and num
invocations, we'll need to create x * num
versions and aliases, so that we can invoke them in parallel. I often run the tool with 5 power values and num=50
(or even 100), so that means 250+ versions and aliases created during initialization.
I'd agree this mechanism is better for the overall execution time, even if the initialization phase will take much longer. As far as I can remember, the creation of new versions/aliases cannot be parallelized. Initializing 4-5 versions currently takes 7-8 seconds. With num=50
it will take more than 6 minutes.
Is there any limit regarding the # of aliases per function?
The only limit I am aware of is the Code storage of 75 GB. In an account with few lambdas it should not be a problem, in an account with dozens of them we could hit the limit... and of course it depends from the size of the lambda under test. We could state clearly in the documentation that it will be used lambdaSize * powerValues * num
storage to make the test.
As far as I can remember, the creation of new versions/aliases cannot be parallelized.
It's a lambda limitation? Even if we use a map step in the state machine it would fail?
Anyway, even if the initializing time is long, but we could also state this in the docs to warn the user.
It's a lambda limitation? Even if we use a map step in the state machine it would fail?
Yes, because you're always working on $LATEST when creating new versions and aliases.
I've just implemented a first iteration of this (both initializer and cleanr logic). I'm going to run some tests and share the WIP code in a new PR later today.
@Parro it works :) Check out the PR #177
Hey @alexcasalboni @Parro I was wondering if there's any movement on this? I noticed a few open PRs that seem to be working, but not much recent activity. Is there anything that I could help with if there are some rough edges that need a hand?
If there's 1 PR that might be the direction this moves in (if indeed it is planned to move forward with this feature), then I could just clone that version and deploy that short term.
Thanks
@ryancormack thank for checking :) yes, we're definitely moving forward to find the ideal solution for this!
Currently, there are two open PRs using different approaches:
num
new versions/aliases for each memory configuration, which does work but it's a bit extreme in the amount of overhead it creates and # of API calls - you easily end up creating and destroying hundreds of versions/aliases, and it also creates an upper-bound of approaximately 500 versions/aliases we can create in 15min (reducing the number of configurations*invocations you can test)That said, I'm quite sure the second PR is closer to the direction we'll choose and I'd recommend you clone that version for the time being. I'm currently working with a few colleagues at AWS to speed up the maintenance of this tool, so I'd expect we'll settle on a final solution in the next 30-60 days π
Thanks Alex, it's working mostly well. I've created that issue above - I know it was slightly mentioned way up this issue, but I don't know if it's more nuanced and neither PR currently accounts for it.
The tool was super helpful in actually spawning a huge number of cold starts for me, which was really helpful, and I could use Cloudwatch Log Queries to get the 'end user latency' times that I was hoping to get.
Quick update on this: we're continuing our work on #206 - it turns out that approach is also useful to solve a SnapStart-related problem.
Apologies for the delay, we should be able to finalize the current implementation in a matter of weeks.
Any progress on this feature? I have some use cases for profiling cold starts only.
hi all π apologies for the very long wait :)
It's not on SAR yet, but you can deploy the latest via CLI.
Closing this issue, but let us know if you encounter any problems.
The tool could provide an option to power-tune a given function considering only cold start invocations.
The current logic is based on aliases in order to maximize parallelism and optimize overall speed of the power-tuning process. Unfortunately, it also makes it hard to achieve cold-start-only tuning.
We could implement an alternative logic such as:
This should work fine for all values of
num
and any power value.The only drawback is that nothing can be parallelized, which isn't a big issue as long as each invocation is short enough. For example, if the average cold invocation takes 5 seconds, with
num=20
and 5 power values, the overall power-tuning process will take about 8 minutes. It's very easy to reach 40+ minutes with 10s invocations andnum>50
.