aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.61k stars 3.91k forks source link

(lambda): Get Function code + layer exceeds maximum size error when it does not #8446

Open rehanvdm opened 4 years ago

rehanvdm commented 4 years ago

The Lambda layer is being applied before code changes when updating. Then the max size limit of the lambda is reached when deploying. This is either an CDK specific issue or AWS Cloud Formation, I could not find any evidence of this on the internet and consider it to be a fringe case.

Reproduction Steps

I have a deployed lambda function that is 230MB. Then after making changes to the lambda function it is reduced to ~50MB. A lambda layer is applied to the this function ~130MB. The combined size of the lambda + layer is then ~180MB which is less than the 250MB limit. When I try to deploy this I get the following cloudformation error:

...
 20/101 | 1:27:01 PM | UPDATE_FAILED        | AWS::Lambda::Function                | <LambdaFunctionName> (XXX) Function code combined with layers exceeds the maximum allowed size of 262144000 bytes. The actual size is 361955628 bytes. (Service: AWSLambdaInternal; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: XXX)
...

Which looks like the Layer is applied before the code update (as the new code + layer is less than 250MB), and then fails the size constraint. The actual size it reports is the old code + layer: 230MB + 130MB = 360MB which lead me to this fringe case conclusion.

It deploys when you completely destroy the stack and then use the layer + new code. Which is the same as destroying the lambda function or manually updating the lambda with the new code first and then running the update using cloud formation/CDK.

Environment


This is :bug: Bug Report

rehanvdm commented 4 years ago

FYI, this post explains more detail on why the packages are so big https://www.rehanvdm.com/serverless/an-unexpected-journey-with-lambda-oracledb/index.html

nija-at commented 4 years ago

This doesn't look like a CDK issue, perhaps an issue in how lambda handles updates to layers. The CDK doesn't hold any past state - it simply looks at the current code and configuration you have and formulates a new CloudFormation template.

@iph - any idea if this is a known edge case in lambda?

nija-at commented 4 years ago

Hypothesis - Perhaps the order in which Cloudformation applied these updates caused this. This could happen if the update happens to the lambda function before it updates the lambda layer.

@rehanvdm - are you able to provide the full output of cdk deploy including the error, so we can see the full set of actions Cloudformation executed before hitting this error?

rehanvdm commented 4 years ago

@nija-at I came to the same hypothesis. Unfortunately I can not provide it without permission from the client, it is an enterprise client, lot's of red tape so I am not even going to try. We did solve this by updating the lambda and removing the big libraries, then doing another deploy that adds the layer. I just found it strange that no one has reported it yet (not on CDK explicitly, but no where on the internet, unless my googling skills are failing me)?

nija-at commented 4 years ago

I was able to reproduce this error (code and error pasted below) and can confirm that my hypothesis was wrong. The lambda layer does get updated before the lambda function so this is not an ordering problem.

This doesn't look like an issue coming from the CDK. We'll have to take it up with the lambda service.

Code:

#!/usr/bin/env node
import { App, Stack } from '@aws-cdk/core';
import { Code, Function, LayerVersion, Runtime } from '@aws-cdk/aws-lambda';

const app = new App();
const stack = new Stack(app, 'mystack');

const layer = new LayerVersion(stack, 'layerver', {
  code: Code.fromAsset('50'),
});

const fn = new Function(stack, 'fn', {
  code: Code.fromAsset('130'),
  runtime: Runtime.NODEJS_12_X,
  handler: 'index.handler',
  layers: [ layer ],
});

Deploy error:

mystack: creating CloudFormation changeset...
 0/3 | 17:38:39 | UPDATE_IN_PROGRESS   | AWS::Lambda::LayerVersion | layerver (layerverC2CBE0B8) Requested update requires the creation of a new physical resource; hence creating one.
 0/3 | 17:38:49 | UPDATE_IN_PROGRESS   | AWS::Lambda::LayerVersion | layerver (layerverC2CBE0B8) Resource creation Initiated
 1/3 | 17:38:49 | UPDATE_COMPLETE      | AWS::Lambda::LayerVersion | layerver (layerverC2CBE0B8)
 1/3 | 17:38:51 | UPDATE_IN_PROGRESS   | AWS::Lambda::Function     | fn (fn5FF616E3)
 2/3 | 17:38:51 | UPDATE_FAILED        | AWS::Lambda::Function     | fn (fn5FF616E3) Function code combined with layers exceeds the maximum allowed size of 262144000 bytes. The actual size is 293601280 bytes. (Service: AWSLambdaInternal; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: 6dca5252-c3cc-4b66-8084-b27d217c8ba0)
    new Function (/Users/nija/workplace/cdk/hello-cdk/node_modules/@aws-cdk/aws-lambda/lib/function.ts:507:35)
    \_ Object.<anonymous> (/Users/nija/workplace/cdk/hello-cdk/bin/hello-cdk.ts:13:12)
    \_ Module._compile (internal/modules/cjs/loader.js:1133:30)
    \_ Module.m._compile (/Users/nija/workplace/cdk/hello-cdk/node_modules/ts-node/src/index.ts:858:23)
    \_ Module._extensions..js (internal/modules/cjs/loader.js:1153:10)
    \_ Object.require.extensions.<computed> [as .ts] (/Users/nija/workplace/cdk/hello-cdk/node_modules/ts-node/src/index.ts:861:12)
    \_ Module.load (internal/modules/cjs/loader.js:977:32)
    \_ Function.Module._load (internal/modules/cjs/loader.js:877:14)
    \_ Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:74:12)
    \_ main (/Users/nija/workplace/cdk/hello-cdk/node_modules/ts-node/src/bin.ts:227:14)
    \_ Object.<anonymous> (/Users/nija/workplace/cdk/hello-cdk/node_modules/ts-node/src/bin.ts:513:3)
    \_ Module._compile (internal/modules/cjs/loader.js:1133:30)
    \_ Object.Module._extensions..js (internal/modules/cjs/loader.js:1153:10)
    \_ Module.load (internal/modules/cjs/loader.js:977:32)
    \_ Function.Module._load (internal/modules/cjs/loader.js:877:14)
    \_ Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:74:12)
    \_ /Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/npm/node_modules/libnpx/index.js:268:14
 2/3 | 17:38:52 | UPDATE_ROLLBACK_IN_P | AWS::CloudFormation::Stack | mystack The following resource(s) failed to update: [fn5FF616E3].

 ❌  mystack failed: Error: The stack named mystack is in a failed state: UPDATE_ROLLBACK_COMPLETE
    at /Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/lib/api/util/cloudformation.ts:256:13
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at waitFor (/Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/lib/api/util/cloudformation.ts:166:20)
    at Object.deployStack (/Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/lib/api/deploy-stack.ts:263:26)
    at CdkToolkit.deploy (/Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:181:24)
    at main (/Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/bin/cdk.ts:250:16)
    at initCommandLine (/Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/bin/cdk.ts:183:9)
The stack named mystack is in a failed state: UPDATE_ROLLBACK_COMPLETE
nija-at commented 4 years ago

Internal ref: t.corp/V217058190

rehanvdm commented 4 years ago

@nija-at Thanks for taking the initiative and creating an example, appreciate it. Will you leave this ticket open until you get response from internal teams?

nija-at commented 4 years ago

as best as possible, at least until the issue gets ack'ed.

lukeknxt commented 3 years ago

Just confirming that I'm still experiencing this issue in 2021. Thought I was going slightly crazy until I found this.

In my experience, I had a few lambda functions that were each very fat, each having a copy of some large shared libraries. Naturally I wanted to refactor to extract the shared libraries into a layer, however after doing so, I ran into this problem where even though my functions were now tiny and the layer was fat, my cdk deploy would tell me that:

Function code combined with layers exceeds the maximum allowed size of 262144000 bytes. The actual size is 296335820 bytes.

Even though what I was seeing in the build was more like:

❯ du -sh asset.*
4.0K    asset.39c7c0b56d2b94f5320257b13eb8c25532e20918e7f37483d070959f752b3886
4.0K    asset.81c2c95c803b458187259bf4081da3e1fc7cb08551d22f75a12349273555fa49
 93M    asset.ab3b51a3705756fa3e9283340417420048ab6b4d06677dd05c319aa6d1567e95

And this was my whole deployment, so I couldn't understand how I was breaching the 250 MB limit.

To resolve it, I had to cdk destroy and cdk deploy again which is disappointing.

Steps to reproduce from my experience are therefore something like:

  1. Produce a fat lambda function with a bunch of large libraries (almost breaching the limit)
  2. cdk deploy
  3. Move large libraries out into a layer (almost breaching the limit)
  4. Verify that combined unzipped size does not exceed the limit
  5. cdk deploy
chialunwu commented 2 years ago

Bumping this! I encountered the same issue. It's very confusing. I had to do 2 deployments to get around this issue

  1. Deploy the "thin" lambda WITHOUT the layers (this could lead to runtime errors because of missing layers!)
  2. Deploy the lambda WITH the layers.
peterwoodworth commented 2 years ago

The internal ticket raised by Niranjan still exists and hasn't been engaged with yet. I've reached out to the team about this to hopefully get some engagement

suryaavala commented 2 years ago

bumping this up! would be nice to be able to fix this without causing any downtime to the underlying lambda

jsheldon-qci commented 2 years ago

We are also still experiencing this issue, and it's causing quite a bit of pain with our deployments.

mixtah commented 2 years ago

Also experiencing this issue with our lambda layers. Originally had stuff written in AWS SAM, which worked with our lambda + layers setup, however once converting the infrastructure to CDK, the issue has come up, despite no real change in the code size.

madeline-k commented 1 year ago

New ref for internal ticket: P80248897. Unfortunately, we can't do much about this issue from the CDK perspective.

kaplundanny commented 1 year ago

any update on this ?

roryzhg commented 1 year ago

Face the same issue. I was adding the layer to the lambda function through Typescript CDK lambda Construct Props. Talked to AWS support and in AWS cloud trail we found the event for lambda update through CDK is

UpdateFunctionConfiguration  (The layer used for lambda is added to the lambda function)
UpdateFunctionCode (7 second later than previous action. The actual lambda code to use the new layer is being updated in this event)

UpdateFunctionConfiguration failed because it is still using the previous lambda function with new layer and the total size exceed limit.

Ideally, CDK should create resources in the following order, 1) create the new function 2) create the layer 3) add layer to the function. Or, CDK should temporary increase the lambda size for the deployment.

My walk around:

I tried to create layer in another CDK stack and add lambda CDK dependency to the layer (so it should wait for the lambda finish deployment first), and use lambda.addLayers function to add the layer. This fail during deployment due to circular dependency issue. Meaning the lambda still trying to grab the layer at the deployment in which layer is also waiting for the lambda to finish deployment.

Next I will try manually deploy lambda without layer added, and immediately deploy another lambda revision with the layer.

tomoima525 commented 1 year ago

I faced the same issue today and wanted to share my walk around.

I renamed the function that is causing the issue e.g. yourfunctionV2. This creates a new function and removes the old one. This way, you don't have to manually deploy again to add layers.

jeffski commented 5 months ago

Have run in to this a couple of times now and wanted to share the workaround that worked for us.

We are deploying using the Serverless Framework and essentially what we do is rename the Lambda in the config file. This creates a brand new Lambda, instead of trying to modify the existing Lambda. We are running the Lambda in a Step Function so we update that to use the new Lambda name.

This all seems to work although with minor disruption while the changeover happens, I think due to the way things align in Step Functions. It is preferable to removing and re-adding layers or doing a remove/redeploy as our deployments take several minutes and would result in considerable down time.

Anyway, this might be an option for anyone in this situation and might work with other triggers.

jasonpraful commented 3 months ago

Bumping this issue. I've encountered this exact same issue. This issue's open for over 4years now, and the workaround mentioned in the comments above doesn't help maintain SLA cause of the downtime whilst updating the stack.

iph commented 3 months ago

I don't see how it causes any downtime? CFN always creates a new Lambda, links all the old Lambda attributes to it, then deletes the old Lambda. At no point should you experience downtime, unless there's specific to your setup?

iph commented 3 months ago

Also, I was tagged here 4 years ago when I was on the Lambda Team (moved on to APIGW recently, so close by :) ) and have never checked this until I got a notification today :(

For clarity:

The fundamental problem, is what @roryzhg pointed out: It's alllll in the APIs. Lambda CloudFormation resource only calls Lambda APIs, so if one can't create a combination of API calls to sort out the limitation, then it's unsolvable in CloudFormation.

The API itself is flawed:

There are multiple ideal solutions:

I would suggest reaching out to Lambda support cases, and linking my message, so hopefully they take those suggestions to fix this egregious miscalculation in the future.

jasonpraful commented 3 months ago

@iph Thanks for getting back!

I don't think so

  1. CFN never attempted to create a new lambda when applying an update.
  2. CFN seemed to apply the new layers first - this caused my lambda to cross 250MB which instantly failed the CFN update even before the core function got a chance to update.
  3. In another lower environment, I removed the new layers, did an update to the lambda and then applied the layers back again which fixed the issue, however, some of these layers have logic because of which I cannot replicate this in production without downtime as the lambda might error for requests.

Edit

Have an active support ticket, will reference this issue in there

iph commented 3 months ago

CFN never attempted to create a new lambda when applying an update.

I thought the original suggestion for workaround was to change the Logical ID. Was that not done?

iph commented 3 months ago

Note: The reason why changing the logical ID should work, is that it's a cascade effect.

Updating Logical ID actually does:

  1. Create function
  2. Update usage everywhere
  3. Delete old function

The reason this works well, unlike the update path, CreateFunction has all the suggestions I said above: All Code+Layers are defined in an atomic operation, meaning it can calculate the end result in 1 go.

jasonpraful commented 3 months ago

Whilst I would ideally agree with the above we unfortunately cannot do that. We have an edge case where the lambda ARN is used in multiple places outside the lambda stack. We have another internal ticket to streamline all the code & infrastructure surrounding this - but that's one to be tackled another day! :D

iph commented 3 months ago

Sadness. I think for the moment, the workaround doesn't work for you then and what you are fiddling with may be the best approach for now :(