Feature request: make it possible to keep docker container warm

jandockx commented 6 years ago

I understand from other issues that a new docker container is started for each request. This makes some experiments or automated tests undoable in practice. SAM Local is much too slow in the context where more then 1 request is to be handled.

I suspect that hot reloading depends on this feature.

I think it would be a good idea to make it possible to choose, while this project evolves further, to forego hot reloading, but to keep the docker container warm.

Something like

sam local start-api -p <PORT> --profile <AWS PROFILE> --keep-it-warm

This would broaden the applicability of sam local enormously.

Thank you for considering this suggestion. This looks like an awesome project.

aldegoeij commented 6 years ago

+1 Python container takes too long to start for simple debugging...

zippadd commented 6 years ago

+1. This currently makes local automated testing painful at best.

Thanks for the continued work on this project!

dannymcpherson commented 6 years ago

Have there been any eyes on this? The benefit would be so huge.

cagoi commented 6 years ago

+1

hobotroid commented 6 years ago

+1

daveykane commented 6 years ago

+1

adrians5j commented 6 years ago

+1

CRogers commented 6 years ago

+1, even a simple hello world java8 lambda takes 3/4 seconds for each request!

CRogers commented 6 years ago

My sketch proposal to make warm containers work and maintain all the existing nice hot reload/memory usage etc functionality around them:

Currently, the container is simply run with handler argument and the event passed in via an environment variable. The containers logs are then piped to the console stdout/stderr and it just records how much memory is used.

Instead, we can start the container with bash as the entrypoint and -c "sleep infinity" as the argument, so it runs effectively nothing and keeps container alive. We record the container id in an (expiring) dict so we can reuse it again. When we want to run the lambda we run docker exec that runs the previously used lambda entrypoint and the correct environment. Since we run one lambda per container we can still record memory usage. If we key the running containers by the version of the lambda code we're running we can ensure hot reload still works. As always with caches the invalidation would be the interesting part - you probably want to kill out of date containers and kill containers when the tool exits.

monofonik commented 6 years ago

+1

luisvsm commented 6 years ago

+1 Very interested in this feature

luketn commented 6 years ago

+1 Yes please!

nodeit commented 6 years ago

+1, throwing my hat in the ring on this too

jfuss commented 6 years ago

As a note: Please use the reaction feature on the top comment. We do look at issues sorted by thumbs up (as well as other reactions). Commenting +1 does not good for that and adds noise to the issue.

scoates commented 6 years ago

@jfuss I agree (and had done this). Any feedback from your team would be helpful here, though. The closest thing we had to knowing if this is on your radar (before your comment) was duplicate issue consolidation and labeling.

ejoncas commented 5 years ago

+1, this would be very beneficial for people using java + spring boot.

thoratou commented 5 years ago

+1, around 1s for golang case

kevanpng commented 5 years ago

I did an experiment with container reuse. This is just with a lambda in python, I'm developing on ubuntu 16.04. In summary, docker container spinning up only takes an extra second. So it is not worth making the feature for container reuse. Link to my code https://github.com/kevanpng/aws-sam-local .

For a fixed query, both my colleague and I have 4s invocation time on sam local. His is a windows machine. With giving the profile flag and the container reuse, it goes down to 2.5s in my ubuntu.

My colleague is running on mac and when he tried the same query with lambda reuse and profile flag, he still had 11-14 seconds to run.

Maybe it could be that docker is slow on mac?

ghost commented 5 years ago

1 second is a world's difference when building an API and you expect to serve more than 1 request.

I think it's well worth the feature.

sanathkr commented 5 years ago

@kevanpng Hey I was looking through your code to understand what exactly you did.. So basically, you create the container once with a fixed name, run the function, and on next invocation look for container with same name and simply container.exec_run instead of creating it from scratch again. Is my summary correct?

I am super surprised Docker container creation makes this big of a difference. We can certainly look deeper into this if it is becoming usability blocker.

scoates commented 5 years ago

@sanathkr. Thanks for looking at this. FWIW, it's a huge usability blocker for me:

~/src/faculty/buildshot$ time curl -s http://127.0.0.1:3000/ >/dev/null # SAM container via Docker

real    0m6.891s
user    0m0.012s
sys 0m0.021s
~/src/faculty/buildshot$ time curl -s http://127.0.0.1:5000/ >/dev/null # regular python app via flask dev/debug server (slow)

real    0m0.039s
user    0m0.012s
sys 0m0.019s

And the Instancing.. is quick. It's Docker (and the way Docker is used here) that's slow. The (slow) werkzeug-based dev server is ~175x faster than waiting around for Docker. And this is for every request, not just startup. (And yes, this is from my Mac.)

sanathkr commented 5 years ago

@scoates Thanks for the comparison. Its not apples-to-apples to compare vanilla Flask to Docker-based app. But the 6 second duration with SAM CLI is definitely not what I would expect..

Did you have the Docker container already downloaded?
Also, can you start SAM CLI with --skip-pull-image flag? This will prevent the CLI to ask Docker for latest image version on every invoke. Do share your numbers again with this flag set.

Thinking ahead: I think we need to add more instrumentation to SAM CLI codebase in order to understand the parts that contribute to the high latency. It could be cool if we can run the instrumented code in a Travis build with every PR so we can assess the performance impact of new code changes. We also need to run this on variety of platforms to understand the real difference between Mac/Ubuntu.

sanathkr commented 5 years ago

I did some more profiling by crudely commenting out parts of the codebase. Also this is not run multiple times. So the numbers are ballpark estimates. I ran sam init and ran sam local-start-api on a simple HelloWorld Lambda function created by the init template.

Platform: MacOSX Docker version: 18.06.0

WARNING: Very crude measurements.

Total execution time (sam local start-api): 2.67 seconds Skip pull images (sam local start-api --skip-pull-image): 1.45 seconds Create container, run it, and return immediately without waiting for function terminate: 1.05 seconds Create container, don't run it: 0.2 seconds SAM CLI code overhead (don't create container at all): 0.045 seconds

Based on the above numbers, I arrived at a rough estimate for each step of the invoke path by assuming:

Total execution = SAM CLI overhead + Docker Image pull + Create container + Run Container + Run function

Then, here is how much each steps took:

SAM CLI Overhead: 0.045 seconds Docker Image Pull Check: 1.3 seconds Create Container: 0.15 seconds Run container: 0.85 seconds Run function: 0.45 seconds

The most interesting part is Create vs Run container durations. Run is 5x of Create. So it is better if we optimized for the Run duration.

If we were to do a warm start, then we would be saving some fraction of the 0.85 seconds it took to run the container. We should be keeping the runtime process up and running inside the container and re-run just the function in-place. Otherwise we aren't going to save much.

scoates commented 5 years ago

Hi. Sorry for the late reply. I was traveling last week and forgot to get to this when I returned.

I agree absolutely that apigw and flask aren't apples-to-apples, and crude measurements are definitely where we're at right now.

With --skip-pull-image, I still get request starts in the 5+ second range. Entirely possible there's slow stuff in my code (though it's small, so I'm not sure where that would come from; it really does seem like docker). Here are the relevant bits of a request (on a warm start; this is several requests into sam local start-api --skip-pull-image):

[ 0.00] 2018-10-16 20:18:44 Starting new HTTP connection (1): 169.254.169.254
[ 1.01] 2018-10-16 20:18:45 Requested to skip pulling images ...
[ 0.00]
[ 0.00] 2018-10-16 20:18:45 Mounting /Users/sean/src/faculty/buildshot/buildshot/build as /var/task:ro inside runtime container
[!5.32] START RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a Version: $LATEST
[ 0.00] Instancing..
[ 0.00] [DEBUG] 2018-10-17T00:18:50.714Z    13e564e9-1160-4c0e-b1e2-b31bbadd899a    Zappa Event: {'body': None, 'httpMethod': 'GET', 'resource': '/', 'queryStringParameters': None, 'requestContext': {'httpMethod': 'GET', 'requestId': 'c6af9ac6-7b61-11e6-9a41-93e8deadbeef', 'path': '/', 'extendedRequestId': None, 'resourceId': '123456', 'apiId': '1234567890', 'stage': 'prod', 'resourcePath': '/', 'identity': {'accountId': None, 'apiKey': None, 'userArn': None, 'cognitoAuthenticationProvider': None, 'cognitoIdentityPoolId': None, 'userAgent': 'Custom User Agent String', 'caller': None, 'cognitoAuthenticationType': None, 'sourceIp': '127.0.0.1', 'user': None}, 'accountId': '123456789012'}, 'headers': {'X-Forwarded-Port': '3000', 'Host': 'localhost:3000', 'X-Forwarded-Proto': 'http', 'Accept': '*/*', 'User-Agent': 'curl/7.54.0'}, 'stageVariables': None, 'path': '/', 'pathParameters': None, 'isBase64Encoded': True}
[ 0.00]
[ 0.00] [INFO]  2018-10-17T00:18:50.731Z    13e564e9-1160-4c0e-b1e2-b31bbadd899a    127.0.0.1 - - [17/Oct/2018:00:18:50 +0000] "GET / HTTP/1.1" 200 15 "" "curl/7.54.0" 0/16.916
[ 0.00]
[ 0.00] END RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a
[ 0.00] REPORT RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a Duration: 4684 ms Billed Duration: 4700 ms Memory Size: 128 MB Max Memory Used: 42 MB
[ 0.58] 2018-10-16 20:18:51 127.0.0.1 - - [16/Oct/2018 20:18:51] "GET / HTTP/1.1" 200 -

The [ 0.xx] prefix is returned by a util I have that shows elapsed time between stdout lines. Here's the important part, I think:

[!5.32] START RequestId: 13e564e9-1160-4c0e-b1e2-b31bbadd899a Version: $LATEST
[ 0.00] Instancing..

I acknowledge that Instancing.. might just not be output until it's complete, so that by itself isn't a valid measurement point. Just wanted to pass on that I'm seeing 5s of lag in my requests.

I'm not sure how to measure much deeper than that.

More info:

$ docker --version
Docker version 18.06.1-ce, build e68fc7
$ uname -a
Darwin sarcosm.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64 i386 MacBookPro11,4 Darwin
$ sam --version
SAM CLI, version 0.5.0

I also agree that if I can get this down to sub-1s request times, it's probably usable. 5s+ is painful, still, though.

(Edit: adding in case anyone looking for Zappa info stumbles on this. I'm using an experimental fork of the Zappa handler runtime. This doesn't really apply to Zappa-actual. At least not right now.)

vrealzhou commented 5 years ago

+1

sanathkr commented 5 years ago

@scoates Thank you for the detailed information. Definitely worth instrumenting the code further to dive deep. I will reach out when I have an instrumented version of SAM CLI to try out.

gkfirst8 commented 5 years ago

In case of java8: the on-line version of Lambda keeps the JVM running for a long time. It would be nice to replicate this in 'sam local' mode as then I can study any memory usage issues/OutOfMemoryExceptions I see.

clicktravel-steffan commented 5 years ago

+1 We're running test suites for fairly simple microservices implemented as node.js Lambda functions, where individual tests can take 10+ seconds to run. The lion's share of this time is spent waiting for the function under test to start running, judging from the test output.

vrealzhou commented 5 years ago

I've wrote a tool can simulate sam local start-lambda and keep warm for Lambda written in Golang. https://github.com/vrealzhou/lambda-local. Not perfect but helps in our own project.

metaskills commented 5 years ago

Maybe it could be that docker is slow on Mac?

I've done some reading around on the internet and think this is just the case. For example, when requiring a bunch of gems, things slow down locally ALOT. And it appears this slowness comes from mounted volumes and Docker's osxfs. No clue how to make this better tho.

metaskills commented 5 years ago

I've been reading this article (https://docs.docker.com/docker-for-mac/osxfs-caching/) with regard to Docker volume performance on Mac. I've been reading a little of SAM's Python (not my primary language) and I've landed in local/docker/manager.py and wonder if we should be using :delegated or :cached along with --docker-volume-basedir, -v, SAM_DOCKER_VOLUME_BASEDIR options to help speed this up?

metaskills commented 5 years ago

OK, not a Python person, but I was able to hack up a few things in my local container.py source and change this from ro to delegated.

        LOG.info("Mounting %s as %s:delegated inside runtime container", self._host_dir, self._working_dir)

        kwargs = {
            "command": self._cmd,
            "working_dir": self._working_dir,
            "volumes": {
                self._host_dir: {
                    # Mount the host directory as "read only" directory inside container at working_dir
                    # https://docs.docker.com/storage/bind-mounts
                    # Mount the host directory as "read only" inside container
                    "bind": self._working_dir,
                    "mode": "delegated"
                }
            },
            # We are not running an interactive shell here.
            "tty": False
        }

And this line in my app.rb went from ~45s to ~13s. Still not ideal, but better!

puts(Benchmark.realtime { require_relative 'config/application' })

metaskills commented 5 years ago

I created this pull request that adds delegated after ro to the volume. https://github.com/awslabs/aws-sam-cli/pull/1046

emorneau commented 5 years ago

+1, my issue is not performance but the fact that we can't perform unit/acceptance tests on warm local lambda and bugs can be found after deployment.

dsanders11 commented 5 years ago

@sanathkr, any progress here? It certainly makes local development a bit painful with how long the response times are.

I played around with some benchmarking myself and figured I'd share. I'm running Docker in a Linux VM. All timings include using delegated for the volume mount.

Just running a simple 'hello world' container which automatically removes once it's done takes ~300-400ms, which is on par with what this source says as well. I used docker events to get accurate timestamps. Unfortunately that's not super fine-grained, going from network connect to container start. Considering I also tried with network set to none and there was no noticeable difference in the time between those two events, I'm sure more is happening between the first and the second, it's just not given a name.

I tried taking the volume mount of the picture and saw significant speed-up in execution duration from doing so. I simply made a Docker image based on lambci/lambda:nodejs8.10 and added the directory to /var/tmp, then changed container.py to not do the volume mount, and hard-coded it to that image.

I saw a 3x speedup in execution duration when using this method (1500ms -> 500ms, 2200ms -> 700ms, etc). Note that this is from SAM's outputted duration, the real-world time includes the Docker overhead of ~400ms. So, the volume mount is definitely a source of significant slowdown in the container. A warm container would perform better if it copied the data and used the copied version instead of using directly from the mount. Of course, this would pose some problems for the hot-reload functionality, as it would require polling to ensure the copied version is up-to-date.

If it's preferred to keep each request in a single container that executes then exits, perhaps a pool of 'warm' containers could be kept. Say a pool of 5 (configurable by a flag) containers are created and each is only 'ready' once it is past the Docker overhead of creating the container and copying the data from the volume mount to /var/task. As the containers are used, new ones are brought up, keeping the pool at the correct size. Ideal pool size would be large enough to cover concurrent requests. The only real tricky thing here is containers would need to be killed off and recreated as file changes happen, otherwise hot-reload would be broken.

I think that would lead to the best performance since container creation overhead, and volume mount overhead seem significant. Volume mount overhead alone is probably enough to keep response times over 1 second most of the time.

hoang-innomize commented 5 years ago

Does anyone have any workaround to speed up the execution time?

humank commented 5 years ago

+1, considering when you run lambda function call, and invoke each other in local, there will be several entire lambda runtime creation life-cycle there, and it's takes long time.

vtirta commented 5 years ago

+1

jfuss commented 5 years ago

Hey all,

I just opened a PR that should really help the pain of 'slow'/'warm' invokes. I would love for people to chime in on the PR (I have deeper details of caveats and approach in the description of #1305). For any willing parties that want to give it a try and install the pr from source, I would love your feedback. We feel it is important for this change to be tested with real workflows to make sure we are hitting the need of you all. Thanks! 🎉

frankh commented 5 years ago

I've also made an attempt at this (#1319) - differs in that it keeps the container running constantly and invokes using docker exec instead of starting/stopping the container constantly.

lowers api gateway lambda latency from ~3s to ~0.2s

thecaddy commented 5 years ago

+1

kastork commented 5 years ago

The speed/time discussions here are all right on point - micronaut has largely solved the startup time issues that traditionally have made lambdas with JVM functions somewhat impractical, but the fresh-container-per-invocation is still a large impact.

But there's another consideration that I've recently run into, and that is the case where your lambda caches data for use between warm-starts.

For example, my app makes AWS SDK calls to secrets manager and other things that I want to avoid when possible. So on a cold start I make those calls, but on a warm start I don't. The current situation makes testing such things impossible in local.

hoang-innomize commented 5 years ago

+100

okiroth commented 5 years ago

+1

fondberg commented 5 years ago

+1

mhart commented 4 years ago

FWIW, I'm going to be pushing out some changes to https://github.com/lambci/docker-lambda in the next week that should support this out of the box.

Basically, each container can be configured (via an env var) to expose an HTTP interface, specifically the Lambda API – which means (after the first invoke), each invoke can hit a warm Lambda. Current testing on my Mac gives around 433 req/sec (2.3ms/req).

Assuming you have an index.js with a handler() in your current directory, you can try it out with:

docker run --rm -v $PWD:/var/task -e STAY_OPEN=1 -p 3000:3000 lambci/lambda:nodejs8.10-beta index.handler

(it won't print anything, but if it stays running... it's working)

Then you can invoke it (multiple times) using:

aws lambda invoke --endpoint http://localhost:3000 --no-sign-request --function-name doesnotmatter --payload '{}' output.json

Or simply just:

curl -d '{}' http://localhost:3000/2015-03-31/functions/myfunction/invocations

Very likely that the env var name will change, and probably the default port too – so don't get attached to those.

Will update you when I've pushed everything up 👍

dgergely commented 4 years ago

@mhart Hi! any update on this? Thanks!

mhart commented 4 years ago

@dgergely yup, plenty – 1.2k lines of Go, C#, Java, Python and JS to be precise: https://github.com/lambci/docker-lambda/pull/218

mhart commented 4 years ago

Alrighty, 1.8k lines later, all runtimes are supported. I still want to clean up some code and make sure stdout/stderr are supported correctly in the new model.

I've pushed all images to -beta tags. Eg, lambci/lambda:ruby2.5-beta, lambci/lambda:dotnetcore2.1-beta, etc

The env var to use is DOCKER_LAMBDA_STAY_OPEN and the port by default is 9001 – going to make this configurable soon.

So basically:

docker run --rm \
  -e DOCKER_LAMBDA_STAY_OPEN=1 \
  -p 9001:9001 \
  -v $PWD:/var/task \
  lambci/lambda:ruby2.5-beta \
  lambda_function.lambda_handler

You should then see:

Lambda API listening on port 9001...

And then you can invoke using:

curl -d '{}' http://localhost:9001/2015-03-31/functions/myfunction/invocations

OR

aws lambda invoke --endpoint http://localhost:9001 --no-sign-request --function-name myfunction --payload '{}' output.json

If you don't supply DOCKER_LAMBDA_STAY_OPEN then everything should function as it currently does (some very minor changes for some runtimes, but none of them should be breaking)

mhart commented 4 years ago

Due to the way I've implemented it, sharing the api server among all the runtimes, it's slower than the custom implementation in my earlier test. I now get around 70 req/s. But this is still much faster than cold starting each time. It would be possible to optimize this, but potentially at the cost of maintainability.

aws / aws-sam-cli

Feature request: make it possible to keep docker container warm #239