Open rehanvdm opened 4 years ago
FYI, this post explains more detail on why the packages are so big https://www.rehanvdm.com/serverless/an-unexpected-journey-with-lambda-oracledb/index.html
This doesn't look like a CDK issue, perhaps an issue in how lambda handles updates to layers. The CDK doesn't hold any past state - it simply looks at the current code and configuration you have and formulates a new CloudFormation template.
@iph - any idea if this is a known edge case in lambda?
Hypothesis - Perhaps the order in which Cloudformation applied these updates caused this. This could happen if the update happens to the lambda function before it updates the lambda layer.
@rehanvdm - are you able to provide the full output of cdk deploy
including the error, so we can see the full set of actions Cloudformation executed before hitting this error?
@nija-at I came to the same hypothesis. Unfortunately I can not provide it without permission from the client, it is an enterprise client, lot's of red tape so I am not even going to try. We did solve this by updating the lambda and removing the big libraries, then doing another deploy that adds the layer. I just found it strange that no one has reported it yet (not on CDK explicitly, but no where on the internet, unless my googling skills are failing me)?
I was able to reproduce this error (code and error pasted below) and can confirm that my hypothesis was wrong. The lambda layer does get updated before the lambda function so this is not an ordering problem.
This doesn't look like an issue coming from the CDK. We'll have to take it up with the lambda service.
Code:
#!/usr/bin/env node
import { App, Stack } from '@aws-cdk/core';
import { Code, Function, LayerVersion, Runtime } from '@aws-cdk/aws-lambda';
const app = new App();
const stack = new Stack(app, 'mystack');
const layer = new LayerVersion(stack, 'layerver', {
code: Code.fromAsset('50'),
});
const fn = new Function(stack, 'fn', {
code: Code.fromAsset('130'),
runtime: Runtime.NODEJS_12_X,
handler: 'index.handler',
layers: [ layer ],
});
Deploy error:
mystack: creating CloudFormation changeset...
0/3 | 17:38:39 | UPDATE_IN_PROGRESS | AWS::Lambda::LayerVersion | layerver (layerverC2CBE0B8) Requested update requires the creation of a new physical resource; hence creating one.
0/3 | 17:38:49 | UPDATE_IN_PROGRESS | AWS::Lambda::LayerVersion | layerver (layerverC2CBE0B8) Resource creation Initiated
1/3 | 17:38:49 | UPDATE_COMPLETE | AWS::Lambda::LayerVersion | layerver (layerverC2CBE0B8)
1/3 | 17:38:51 | UPDATE_IN_PROGRESS | AWS::Lambda::Function | fn (fn5FF616E3)
2/3 | 17:38:51 | UPDATE_FAILED | AWS::Lambda::Function | fn (fn5FF616E3) Function code combined with layers exceeds the maximum allowed size of 262144000 bytes. The actual size is 293601280 bytes. (Service: AWSLambdaInternal; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: 6dca5252-c3cc-4b66-8084-b27d217c8ba0)
new Function (/Users/nija/workplace/cdk/hello-cdk/node_modules/@aws-cdk/aws-lambda/lib/function.ts:507:35)
\_ Object.<anonymous> (/Users/nija/workplace/cdk/hello-cdk/bin/hello-cdk.ts:13:12)
\_ Module._compile (internal/modules/cjs/loader.js:1133:30)
\_ Module.m._compile (/Users/nija/workplace/cdk/hello-cdk/node_modules/ts-node/src/index.ts:858:23)
\_ Module._extensions..js (internal/modules/cjs/loader.js:1153:10)
\_ Object.require.extensions.<computed> [as .ts] (/Users/nija/workplace/cdk/hello-cdk/node_modules/ts-node/src/index.ts:861:12)
\_ Module.load (internal/modules/cjs/loader.js:977:32)
\_ Function.Module._load (internal/modules/cjs/loader.js:877:14)
\_ Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:74:12)
\_ main (/Users/nija/workplace/cdk/hello-cdk/node_modules/ts-node/src/bin.ts:227:14)
\_ Object.<anonymous> (/Users/nija/workplace/cdk/hello-cdk/node_modules/ts-node/src/bin.ts:513:3)
\_ Module._compile (internal/modules/cjs/loader.js:1133:30)
\_ Object.Module._extensions..js (internal/modules/cjs/loader.js:1153:10)
\_ Module.load (internal/modules/cjs/loader.js:977:32)
\_ Function.Module._load (internal/modules/cjs/loader.js:877:14)
\_ Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:74:12)
\_ /Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/npm/node_modules/libnpx/index.js:268:14
2/3 | 17:38:52 | UPDATE_ROLLBACK_IN_P | AWS::CloudFormation::Stack | mystack The following resource(s) failed to update: [fn5FF616E3].
❌ mystack failed: Error: The stack named mystack is in a failed state: UPDATE_ROLLBACK_COMPLETE
at /Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/lib/api/util/cloudformation.ts:256:13
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at waitFor (/Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/lib/api/util/cloudformation.ts:166:20)
at Object.deployStack (/Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/lib/api/deploy-stack.ts:263:26)
at CdkToolkit.deploy (/Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:181:24)
at main (/Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/bin/cdk.ts:250:16)
at initCommandLine (/Users/nija/.nvm/versions/node/v12.16.3/lib/node_modules/aws-cdk/bin/cdk.ts:183:9)
The stack named mystack is in a failed state: UPDATE_ROLLBACK_COMPLETE
Internal ref: t.corp/V217058190
@nija-at Thanks for taking the initiative and creating an example, appreciate it. Will you leave this ticket open until you get response from internal teams?
as best as possible, at least until the issue gets ack'ed.
Just confirming that I'm still experiencing this issue in 2021. Thought I was going slightly crazy until I found this.
In my experience, I had a few lambda functions that were each very fat, each having a copy of some large shared libraries. Naturally I wanted to refactor to extract the shared libraries into a layer, however after doing so, I ran into this problem where even though my functions were now tiny and the layer was fat, my cdk deploy
would tell me that:
Function code combined with layers exceeds the maximum allowed size of 262144000 bytes. The actual size is 296335820 bytes.
Even though what I was seeing in the build was more like:
❯ du -sh asset.*
4.0K asset.39c7c0b56d2b94f5320257b13eb8c25532e20918e7f37483d070959f752b3886
4.0K asset.81c2c95c803b458187259bf4081da3e1fc7cb08551d22f75a12349273555fa49
93M asset.ab3b51a3705756fa3e9283340417420048ab6b4d06677dd05c319aa6d1567e95
And this was my whole deployment, so I couldn't understand how I was breaching the 250 MB limit.
To resolve it, I had to cdk destroy
and cdk deploy
again which is disappointing.
Steps to reproduce from my experience are therefore something like:
cdk deploy
cdk deploy
Bumping this! I encountered the same issue. It's very confusing. I had to do 2 deployments to get around this issue
The internal ticket raised by Niranjan still exists and hasn't been engaged with yet. I've reached out to the team about this to hopefully get some engagement
bumping this up! would be nice to be able to fix this without causing any downtime to the underlying lambda
We are also still experiencing this issue, and it's causing quite a bit of pain with our deployments.
Also experiencing this issue with our lambda layers. Originally had stuff written in AWS SAM, which worked with our lambda + layers setup, however once converting the infrastructure to CDK, the issue has come up, despite no real change in the code size.
New ref for internal ticket: P80248897. Unfortunately, we can't do much about this issue from the CDK perspective.
any update on this ?
Face the same issue. I was adding the layer to the lambda function through Typescript CDK lambda Construct Props. Talked to AWS support and in AWS cloud trail we found the event for lambda update through CDK is
UpdateFunctionConfiguration (The layer used for lambda is added to the lambda function)
UpdateFunctionCode (7 second later than previous action. The actual lambda code to use the new layer is being updated in this event)
UpdateFunctionConfiguration
failed because it is still using the previous lambda function with new layer and the total size exceed limit.
Ideally, CDK should create resources in the following order, 1) create the new function 2) create the layer 3) add layer to the function. Or, CDK should temporary increase the lambda size for the deployment.
My walk around:
I tried to create layer in another CDK stack and add lambda CDK dependency to the layer (so it should wait for the lambda finish deployment first), and use lambda.addLayers
function to add the layer. This fail during deployment due to circular dependency issue. Meaning the lambda still trying to grab the layer at the deployment in which layer is also waiting for the lambda to finish deployment.
Next I will try manually deploy lambda without layer added, and immediately deploy another lambda revision with the layer.
I faced the same issue today and wanted to share my walk around.
I renamed the function that is causing the issue e.g. yourfunctionV2
. This creates a new function and removes the old one. This way, you don't have to manually deploy again to add layers.
Have run in to this a couple of times now and wanted to share the workaround that worked for us.
We are deploying using the Serverless Framework and essentially what we do is rename the Lambda in the config file. This creates a brand new Lambda, instead of trying to modify the existing Lambda. We are running the Lambda in a Step Function so we update that to use the new Lambda name.
This all seems to work although with minor disruption while the changeover happens, I think due to the way things align in Step Functions. It is preferable to removing and re-adding layers or doing a remove/redeploy as our deployments take several minutes and would result in considerable down time.
Anyway, this might be an option for anyone in this situation and might work with other triggers.
Bumping this issue. I've encountered this exact same issue. This issue's open for over 4years now, and the workaround mentioned in the comments above doesn't help maintain SLA cause of the downtime whilst updating the stack.
I don't see how it causes any downtime? CFN always creates a new Lambda, links all the old Lambda attributes to it, then deletes the old Lambda. At no point should you experience downtime, unless there's specific to your setup?
Also, I was tagged here 4 years ago when I was on the Lambda Team (moved on to APIGW recently, so close by :) ) and have never checked this until I got a notification today :(
For clarity:
The fundamental problem, is what @roryzhg pointed out: It's alllll in the APIs. Lambda CloudFormation resource only calls Lambda APIs, so if one can't create a combination of API calls to sort out the limitation, then it's unsolvable in CloudFormation.
The API itself is flawed:
There are multiple ideal solutions:
UpdateFunctionCode
and deprecate the usage in UpdateFunctionConfiguration
. This one would be a bit cleaner, as then you have all the necessary resources in one call, so it's atomic counting, vs the split brain knowledge today.code
related bits into UpdateFunctionConfiguration
, therein deprecating the usage of UpdateFunctionCode
, and making UpdateFunctionConfiguration
a single atomic call with all data references.I would suggest reaching out to Lambda support cases, and linking my message, so hopefully they take those suggestions to fix this egregious miscalculation in the future.
@iph Thanks for getting back!
I don't think so
Edit
Have an active support ticket, will reference this issue in there
CFN never attempted to create a new lambda when applying an update.
I thought the original suggestion for workaround was to change the Logical ID. Was that not done?
Note: The reason why changing the logical ID should work, is that it's a cascade effect.
Updating Logical ID actually does:
The reason this works well, unlike the update path, CreateFunction has all the suggestions I said above: All Code+Layers are defined in an atomic operation, meaning it can calculate the end result in 1 go.
Whilst I would ideally agree with the above we unfortunately cannot do that. We have an edge case where the lambda ARN is used in multiple places outside the lambda stack. We have another internal ticket to streamline all the code & infrastructure surrounding this - but that's one to be tackled another day! :D
Sadness. I think for the moment, the workaround doesn't work for you then and what you are fiddling with may be the best approach for now :(
The Lambda layer is being applied before code changes when updating. Then the max size limit of the lambda is reached when deploying. This is either an CDK specific issue or AWS Cloud Formation, I could not find any evidence of this on the internet and consider it to be a fringe case.
Reproduction Steps
I have a deployed lambda function that is 230MB. Then after making changes to the lambda function it is reduced to ~50MB. A lambda layer is applied to the this function ~130MB. The combined size of the lambda + layer is then ~180MB which is less than the 250MB limit. When I try to deploy this I get the following cloudformation error:
Which looks like the Layer is applied before the code update (as the new code + layer is less than 250MB), and then fails the size constraint. The actual size it reports is the old code + layer: 230MB + 130MB = 360MB which lead me to this fringe case conclusion.
It deploys when you completely destroy the stack and then use the layer + new code. Which is the same as destroying the lambda function or manually updating the lambda with the new code first and then running the update using cloud formation/CDK.
Environment
This is :bug: Bug Report