Closed jandockx closed 3 years ago
+1 Python container takes too long to start for simple debugging...
+1. This currently makes local automated testing painful at best.
Thanks for the continued work on this project!
Have there been any eyes on this? The benefit would be so huge.
+1
+1
+1
+1
+1, even a simple hello world java8 lambda takes 3/4 seconds for each request!
My sketch proposal to make warm containers work and maintain all the existing nice hot reload/memory usage etc functionality around them:
Currently, the container is simply run with handler argument and the event passed in via an environment variable. The containers logs are then piped to the console stdout/stderr and it just records how much memory is used.
Instead, we can start the container with bash
as the entrypoint and -c "sleep infinity"
as the argument, so it runs effectively nothing and keeps container alive. We record the container id in an (expiring) dict so we can reuse it again. When we want to run the lambda we run docker exec
that runs the previously used lambda entrypoint and the correct environment. Since we run one lambda per container we can still record memory usage. If we key the running containers by the version of the lambda code we're running we can ensure hot reload still works. As always with caches the invalidation would be the interesting part - you probably want to kill out of date containers and kill containers when the tool exits.
+1
+1 Very interested in this feature
+1 Yes please!
+1, throwing my hat in the ring on this too
As a note: Please use the reaction feature on the top comment. We do look at issues sorted by thumbs up (as well as other reactions). Commenting +1 does not good for that and adds noise to the issue.
@jfuss I agree (and had done this). Any feedback from your team would be helpful here, though. The closest thing we had to knowing if this is on your radar (before your comment) was duplicate issue consolidation and labeling.
+1, this would be very beneficial for people using java + spring boot.
+1, around 1s for golang case
I did an experiment with container reuse. This is just with a lambda in python, I'm developing on ubuntu 16.04. In summary, docker container spinning up only takes an extra second. So it is not worth making the feature for container reuse. Link to my code https://github.com/kevanpng/aws-sam-local .
For a fixed query, both my colleague and I have 4s invocation time on sam local. His is a windows machine. With giving the profile
flag and the container reuse, it goes down to 2.5s in my ubuntu.
My colleague is running on mac and when he tried the same query with lambda reuse and profile flag, he still had 11-14 seconds to run.
Maybe it could be that docker is slow on mac?
1 second is a world's difference when building an API and you expect to serve more than 1 request.
I think it's well worth the feature.
@kevanpng Hey I was looking through your code to understand what exactly you did.. So basically, you create the container once with a fixed name, run the function, and on next invocation look for container with same name and simply container.exec_run
instead of creating it from scratch again. Is my summary correct?
I am super surprised Docker container creation makes this big of a difference. We can certainly look deeper into this if it is becoming usability blocker.
@sanathkr. Thanks for looking at this. FWIW, it's a huge usability blocker for me:
~/src/faculty/buildshot$ time curl -s http://127.0.0.1:3000/ >/dev/null # SAM container via Docker
real 0m6.891s
user 0m0.012s
sys 0m0.021s
~/src/faculty/buildshot$ time curl -s http://127.0.0.1:5000/ >/dev/null # regular python app via flask dev/debug server (slow)
real 0m0.039s
user 0m0.012s
sys 0m0.019s
And the Instancing..
is quick. It's Docker (and the way Docker is used here) that's slow. The (slow) werkzeug-based dev server is ~175x faster than waiting around for Docker. And this is for every request, not just startup. (And yes, this is from my Mac.)
@scoates Thanks for the comparison. Its not apples-to-apples to compare vanilla Flask to Docker-based app. But the 6 second duration with SAM CLI is definitely not what I would expect..
--skip-pull-image
flag? This will prevent the CLI to ask Docker for latest image version on every invoke. Do share your numbers again with this flag set.Thinking ahead: I think we need to add more instrumentation to SAM CLI codebase in order to understand the parts that contribute to the high latency. It could be cool if we can run the instrumented code in a Travis build with every PR so we can assess the performance impact of new code changes. We also need to run this on variety of platforms to understand the real difference between Mac/Ubuntu.
I did some more profiling by crudely commenting out parts of the codebase. Also this is not run multiple times. So the numbers are ballpark estimates. I ran sam init
and ran sam local-start-api
on a simple HelloWorld Lambda function created by the init template.
Platform: MacOSX Docker version: 18.06.0
WARNING: Very crude measurements.
Total execution time (sam local start-api
): 2.67 seconds
Skip pull images (sam local start-api --skip-pull-image
): 1.45 seconds
Create container, run it, and return immediately without waiting for function terminate: 1.05 seconds
Create container, don't run it: 0.2 seconds
SAM CLI code overhead (don't create container at all): 0.045 seconds
Based on the above numbers, I arrived at a rough estimate for each step of the invoke path by assuming:
Total execution = SAM CLI overhead + Docker Image pull + Create container + Run Container + Run function
Then, here is how much each steps took:
SAM CLI Overhead: 0.045 seconds Docker Image Pull Check: 1.3 seconds Create Container: 0.15 seconds Run container: 0.85 seconds Run function: 0.45 seconds
The most interesting part is Create vs Run container durations. Run is 5x of Create. So it is better if we optimized for the Run duration.
If we were to do a warm start, then we would be saving some fraction of the 0.85 seconds it took to run the container. We should be keeping the runtime process up and running inside the container and re-run just the function in-place. Otherwise we aren't going to save much.
Hi. Sorry for the late reply. I was traveling last week and forgot to get to this when I returned.
I agree absolutely that apigw and flask aren't apples-to-apples, and crude measurements are definitely where we're at right now.
With --skip-pull-image
, I still get request starts in the 5+ second range. Entirely possible there's slow stuff in my code (though it's small, so I'm not sure where that would come from; it really does seem like docker). Here are the relevant bits of a request (on a warm start; this is several requests into sam local start-api --skip-pull-image
):
[ 0.00] 2018-10-16 20:18:44 Starting new HTTP connection (1): 169.254.169.254
[ 1.01] 2018-10-16 20:18:45 Requested to skip pulling images ...
[ 0.00]
[ 0.00] 2018-10-16 20:18:45 Mounting /Users/sean/src/faculty/buildshot/buildshot/build as /var/task:ro inside runtime container
[!5.32] START RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a Version: $LATEST
[ 0.00] Instancing..
[ 0.00] [DEBUG] 2018-10-17T00:18:50.714Z 13e564e9-1160-4c0e-b1e2-b31bbadd899a Zappa Event: {'body': None, 'httpMethod': 'GET', 'resource': '/', 'queryStringParameters': None, 'requestContext': {'httpMethod': 'GET', 'requestId': 'c6af9ac6-7b61-11e6-9a41-93e8deadbeef', 'path': '/', 'extendedRequestId': None, 'resourceId': '123456', 'apiId': '1234567890', 'stage': 'prod', 'resourcePath': '/', 'identity': {'accountId': None, 'apiKey': None, 'userArn': None, 'cognitoAuthenticationProvider': None, 'cognitoIdentityPoolId': None, 'userAgent': 'Custom User Agent String', 'caller': None, 'cognitoAuthenticationType': None, 'sourceIp': '127.0.0.1', 'user': None}, 'accountId': '123456789012'}, 'headers': {'X-Forwarded-Port': '3000', 'Host': 'localhost:3000', 'X-Forwarded-Proto': 'http', 'Accept': '*/*', 'User-Agent': 'curl/7.54.0'}, 'stageVariables': None, 'path': '/', 'pathParameters': None, 'isBase64Encoded': True}
[ 0.00]
[ 0.00] [INFO] 2018-10-17T00:18:50.731Z 13e564e9-1160-4c0e-b1e2-b31bbadd899a 127.0.0.1 - - [17/Oct/2018:00:18:50 +0000] "GET / HTTP/1.1" 200 15 "" "curl/7.54.0" 0/16.916
[ 0.00]
[ 0.00] END RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a
[ 0.00] REPORT RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a Duration: 4684 ms Billed Duration: 4700 ms Memory Size: 128 MB Max Memory Used: 42 MB
[ 0.58] 2018-10-16 20:18:51 127.0.0.1 - - [16/Oct/2018 20:18:51] "GET / HTTP/1.1" 200 -
The [ 0.xx]
prefix is returned by a util I have that shows elapsed time between stdout lines. Here's the important part, I think:
[!5.32] START RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a Version: $LATEST
[ 0.00] Instancing..
I acknowledge that Instancing..
might just not be output until it's complete, so that by itself isn't a valid measurement point. Just wanted to pass on that I'm seeing 5s of lag in my requests.
I'm not sure how to measure much deeper than that.
More info:
$ docker --version
Docker version 18.06.1-ce, build e68fc7
$ uname -a
Darwin sarcosm.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 i386 MacBookPro11,4 Darwin
$ sam --version
SAM CLI, version 0.5.0
I also agree that if I can get this down to sub-1s request times, it's probably usable. 5s+ is painful, still, though.
(Edit: adding in case anyone looking for Zappa info stumbles on this. I'm using an experimental fork of the Zappa handler runtime. This doesn't really apply to Zappa-actual. At least not right now.)
+1
@scoates Thank you for the detailed information. Definitely worth instrumenting the code further to dive deep. I will reach out when I have an instrumented version of SAM CLI to try out.
In case of java8: the on-line version of Lambda keeps the JVM running for a long time. It would be nice to replicate this in 'sam local' mode as then I can study any memory usage issues/OutOfMemoryExceptions I see.
+1 We're running test suites for fairly simple microservices implemented as node.js Lambda functions, where individual tests can take 10+ seconds to run. The lion's share of this time is spent waiting for the function under test to start running, judging from the test output.
I've wrote a tool can simulate sam local start-lambda and keep warm for Lambda written in Golang. https://github.com/vrealzhou/lambda-local. Not perfect but helps in our own project.
Maybe it could be that docker is slow on Mac?
I've done some reading around on the internet and think this is just the case. For example, when requiring a bunch of gems, things slow down locally ALOT. And it appears this slowness comes from mounted volumes and Docker's osxfs
. No clue how to make this better tho.
I've been reading this article (https://docs.docker.com/docker-for-mac/osxfs-caching/) with regard to Docker volume performance on Mac. I've been reading a little of SAM's Python (not my primary language) and I've landed in local/docker/manager.py
and wonder if we should be using :delegated
or :cached
along with --docker-volume-basedir, -v, SAM_DOCKER_VOLUME_BASEDIR
options to help speed this up?
OK, not a Python person, but I was able to hack up a few things in my local container.py
source and change this from ro
to delegated
.
LOG.info("Mounting %s as %s:delegated inside runtime container", self._host_dir, self._working_dir)
kwargs = {
"command": self._cmd,
"working_dir": self._working_dir,
"volumes": {
self._host_dir: {
# Mount the host directory as "read only" directory inside container at working_dir
# https://docs.docker.com/storage/bind-mounts
# Mount the host directory as "read only" inside container
"bind": self._working_dir,
"mode": "delegated"
}
},
# We are not running an interactive shell here.
"tty": False
}
And this line in my app.rb
went from ~45s
to ~13s
. Still not ideal, but better!
puts(Benchmark.realtime { require_relative 'config/application' })
I created this pull request that adds delegated
after ro
to the volume. https://github.com/awslabs/aws-sam-cli/pull/1046
+1, my issue is not performance but the fact that we can't perform unit/acceptance tests on warm local lambda and bugs can be found after deployment.
@sanathkr, any progress here? It certainly makes local development a bit painful with how long the response times are.
I played around with some benchmarking myself and figured I'd share. I'm running Docker in a Linux VM. All timings include using delegated
for the volume mount.
Just running a simple 'hello world' container which automatically removes once it's done takes ~300-400ms, which is on par with what this source says as well. I used docker events
to get accurate timestamps. Unfortunately that's not super fine-grained, going from network connect
to container start
. Considering I also tried with network set to none and there was no noticeable difference in the time between those two events, I'm sure more is happening between the first and the second, it's just not given a name.
I tried taking the volume mount of the picture and saw significant speed-up in execution duration from doing so. I simply made a Docker image based on lambci/lambda:nodejs8.10
and added the directory to /var/tmp
, then changed container.py
to not do the volume mount, and hard-coded it to that image.
I saw a 3x speedup in execution duration when using this method (1500ms -> 500ms, 2200ms -> 700ms, etc). Note that this is from SAM's outputted duration, the real-world time includes the Docker overhead of ~400ms. So, the volume mount is definitely a source of significant slowdown in the container. A warm container would perform better if it copied the data and used the copied version instead of using directly from the mount. Of course, this would pose some problems for the hot-reload functionality, as it would require polling to ensure the copied version is up-to-date.
If it's preferred to keep each request in a single container that executes then exits, perhaps a pool of 'warm' containers could be kept. Say a pool of 5 (configurable by a flag) containers are created and each is only 'ready' once it is past the Docker overhead of creating the container and copying the data from the volume mount to /var/task
. As the containers are used, new ones are brought up, keeping the pool at the correct size. Ideal pool size would be large enough to cover concurrent requests. The only real tricky thing here is containers would need to be killed off and recreated as file changes happen, otherwise hot-reload would be broken.
I think that would lead to the best performance since container creation overhead, and volume mount overhead seem significant. Volume mount overhead alone is probably enough to keep response times over 1 second most of the time.
Does anyone have any workaround to speed up the execution time?
+1, considering when you run lambda function call, and invoke each other in local, there will be several entire lambda runtime creation life-cycle there, and it's takes long time.
+1
Hey all,
I just opened a PR that should really help the pain of 'slow'/'warm' invokes. I would love for people to chime in on the PR (I have deeper details of caveats and approach in the description of #1305). For any willing parties that want to give it a try and install the pr from source, I would love your feedback. We feel it is important for this change to be tested with real workflows to make sure we are hitting the need of you all. Thanks! 🎉
I've also made an attempt at this (#1319) - differs in that it keeps the container running constantly and invokes using docker exec
instead of starting/stopping the container constantly.
lowers api gateway lambda latency from ~3s to ~0.2s
+1
The speed/time discussions here are all right on point - micronaut has largely solved the startup time issues that traditionally have made lambdas with JVM functions somewhat impractical, but the fresh-container-per-invocation is still a large impact.
But there's another consideration that I've recently run into, and that is the case where your lambda caches data for use between warm-starts.
For example, my app makes AWS SDK calls to secrets manager and other things that I want to avoid when possible. So on a cold start I make those calls, but on a warm start I don't. The current situation makes testing such things impossible in local.
+100
+1
+1
FWIW, I'm going to be pushing out some changes to https://github.com/lambci/docker-lambda in the next week that should support this out of the box.
Basically, each container can be configured (via an env var) to expose an HTTP interface, specifically the Lambda API – which means (after the first invoke), each invoke can hit a warm Lambda. Current testing on my Mac gives around 433 req/sec (2.3ms/req).
Assuming you have an index.js
with a handler()
in your current directory, you can try it out with:
docker run --rm -v $PWD:/var/task -e STAY_OPEN=1 -p 3000:3000 lambci/lambda:nodejs8.10-beta index.handler
(it won't print anything, but if it stays running... it's working)
Then you can invoke it (multiple times) using:
aws lambda invoke --endpoint http://localhost:3000 --no-sign-request --function-name doesnotmatter --payload '{}' output.json
Or simply just:
curl -d '{}' http://localhost:3000/2015-03-31/functions/myfunction/invocations
Very likely that the env var name will change, and probably the default port too – so don't get attached to those.
Will update you when I've pushed everything up 👍
@mhart Hi! any update on this? Thanks!
@dgergely yup, plenty – 1.2k lines of Go, C#, Java, Python and JS to be precise: https://github.com/lambci/docker-lambda/pull/218
Alrighty, 1.8k lines later, all runtimes are supported. I still want to clean up some code and make sure stdout/stderr are supported correctly in the new model.
I've pushed all images to -beta
tags. Eg, lambci/lambda:ruby2.5-beta
, lambci/lambda:dotnetcore2.1-beta
, etc
The env var to use is DOCKER_LAMBDA_STAY_OPEN
and the port by default is 9001
– going to make this configurable soon.
So basically:
docker run --rm \
-e DOCKER_LAMBDA_STAY_OPEN=1 \
-p 9001:9001 \
-v $PWD:/var/task \
lambci/lambda:ruby2.5-beta \
lambda_function.lambda_handler
You should then see:
Lambda API listening on port 9001...
And then you can invoke using:
curl -d '{}' http://localhost:9001/2015-03-31/functions/myfunction/invocations
OR
aws lambda invoke --endpoint http://localhost:9001 --no-sign-request --function-name myfunction --payload '{}' output.json
If you don't supply DOCKER_LAMBDA_STAY_OPEN
then everything should function as it currently does (some very minor changes for some runtimes, but none of them should be breaking)
Due to the way I've implemented it, sharing the api server among all the runtimes, it's slower than the custom implementation in my earlier test. I now get around 70 req/s. But this is still much faster than cold starting each time. It would be possible to optimize this, but potentially at the cost of maintainability.
I understand from other issues that a new docker container is started for each request. This makes some experiments or automated tests undoable in practice. SAM Local is much too slow in the context where more then 1 request is to be handled.
I suspect that hot reloading depends on this feature.
I think it would be a good idea to make it possible to choose, while this project evolves further, to forego hot reloading, but to keep the docker container warm.
Something like
This would broaden the applicability of sam local enormously.
Thank you for considering this suggestion. This looks like an awesome project.