Open izidorome opened 6 years ago
Could you elaborate a little more on why it is an issue that a new zip is formed for every package
command? It may be difficult to avoid making a new zip every time because the package
command does an md5 of the zip it creates to see if it needs to reupload the code to s3 by comparing the two md5's.
Imagine a scenario where you have a cloudformation file with more than one Lambda declared. For now, let's call it FN1 and FN2, one is at fn1.go file and the second at fn2.go.
I build both of them, which generates two binaries fn1 and fn2.
I run cloudformation package, and it generates 2 zip files and send them to S3.
One week later, I change the fn1 function, but not the fn2. My CI builds both of them, but only the first has a different MD5 (the second has the same MD5 as before).
The problem here is the package command will generate a new zip for the second one too, even if the file did not changed, which causes all my cloudformation declared functions to be deployed.
I'm having the same issue with Python code. Every time I run aws cloudformation package
it creates/uploads a new zip file and changes the CloudFormation template
@rizidoro Can you download the zip files from S3, unzip them locally and diff them? Turns out I had one file which was actually different, because it included a "generated at" date which was being updated everytime I built the CloudFormation script
You also need to check for timestamp differences amongst the files
the package command does an md5 of the zip it creates to see if it needs to reupload the code to s3 by comparing the two
That is exactly the issue. If timestamps on files are different in the zip file, even if they are the same contents, the md5 is different. In the case of scripting languages, this is probably not an issue. However with Go, each time you run go build
, a new binary is created and thus a new timestamp.
This is especially troublesome if you are trying to use CodePipeline and CodeBuild (see https://docs.aws.amazon.com/lambda/latest/dg/automating-deployment.html) because no matter what, package
is always going to create a zip with a different md5.
Perhaps package
should md5 each file in the zip instead of the zip as a whole. As it is now, it's not an accurate comparison.
@jmassara exactly the problem I'm facing right now. The final binary go build
generates change the timestamp.
@rizidoro Yes. This is a bug with package
. It should probably create a temporary file that has a list of the md5 hashes of all files going into the zip. Then md5 this temporary file and use that value as the name of the S3 object.
I have the same issue. Have a CodeCommit repo with a sam.yml containing multiple lambdas.
When from my VM i use aws cli on the promt 2 times after each other, the frist will upload a .ip for every lambda. The seconds one does nothing because nothing changed = correct.
But... doing exactly the same from CodePipeline , CodeBuild (aws cloudform package bla bla) it does not work. You can trigger the pipeline with "Release Change" without needing a commit which will trigger the pipeline. It start a aws cli docker for CodeBuild, gets the input sources from S3 and unzips them. Calls cloudformation package which DOES reupload unchanged code for every lambda causing redeployment in next steps.
Does anyone know a workaround and when this bug will be fixed?
I am having the same issue. I'm finding reviewing CloudFormation change sets painful because they are polluted with changes to Lambda resources that didn't materially change.
I'm seeing the same problem as @jmassara reported above with node. This one is painful for us as we are trying to use a CodePipeline to deploy Lambda@Edge functions with the CDN in the stack - even if we don't touch the functions, the CLI during packaging thinks the files changed resulting in a CDN update (wait 15 min) even if we didn't change anything in the function code. It is far more than just an unnecessary version publish in the change set - slows the entire CD process down unnecessarily because of how slow CloudFront updates are.
Hi, Is there any progress with this feature-request? Comparing the md5sum of each file within a zip instead of md5sum of zip file sounds like a good possible solution for this problem. Appreciate your thoughts and a possible fix for this. We have a CI/CD pipeline with many lambda functions and this problem is causing a new version of aws lambda being deployed everytime unnecessarily.
We are also facing this exact issue.
@rmmeans I have exactly same issue. This not only slows down the deployment, but also the rollbacks.
Guys, my question is not 100% related to this particular bug (I bypassed it by having different and separated lambdas) but there is smth I really cant bypass and I am giving up on it.. I would really appreciate any help/suggestions - please take a look at this error
that package command fails when i have too many deps added to my package.json, and unfortunately, do to the nature of the lambda, there is no way to decrease files amount..
so, is there any way, to , actually, run it with zip64 support ? please help.. I have already given up on this...
The solution may depend on the programming language (and therefore, potentially not possible for some). We solved it in the λ# CLI as follows:
.NET Core has a deterministic build system, which means that if the source files and nuget packages have not changed, then the resulting compiled binaries remain identical as well. During the build phase of the package, the CLI creates a checksum of the file contents and filenames instead of the ZIP file itself. The latter contains date & timestamps that would cause the checksum to change with every build. The result is a package filename that only changes when the underlying code changes, which in-turn, only updates Lambda functions--or Lambda layers--when required.
Any updates on this issue?
I'm facing the exact same problem
I've also been suffering this issue. I am using the sam-cli
and have been trying to optimise the time to run sam package
and sam deploy
. So far I've got to a nice place using a node script to pre-package each of the 29 lambdas into their own directory with the required node_modules
. This is important so that I can make code changes in one file, then run deployment, and it'll very quickly deploy the lambdas for which that file change was necessary. Best case it'll affect 1 lambda and my deployment will take a few seconds.
As per the rest of the conversation in this issue, the md5 of the zip is different each time. Here is a demonstration:
~/C/t/test ❯❯❯ mkdir out
~/C/t/test ❯❯❯ touch out/test
~/C/t/test ❯❯❯ echo "Hello world" > out/test
~/C/t/test ❯❯❯
~/C/t/test ❯❯❯ md5 out/test
MD5 (out/test) = f0ef7081e1539ac00ef5b761b4fb01b3
~/C/t/test ❯❯❯ zip -rqX out.zip out
~/C/t/test ❯❯❯ md5 out.zip
MD5 (out.zip) = 5f28021c0b6fc266abbfb1b36870fa1d
~/C/t/test ❯❯❯
~/C/t/test ❯❯❯ zip -rqX out2.zip out
~/C/t/test ❯❯❯ md5 out2.zip
MD5 (out2.zip) = 5f28021c0b6fc266abbfb1b36870fa1d
~/C/t/test ❯❯❯ # Same md5!
~/C/t/test ❯❯❯ echo "Hello world" > out/test
~/C/t/test ❯❯❯ md5 out/test
MD5 (out/test) = f0ef7081e1539ac00ef5b761b4fb01b3
~/C/t/test ❯❯❯ # Same md5 for file!
~/C/t/test ❯❯❯ zip -rqX out3.zip out
~/C/t/test ❯❯❯ md5 out3.zip
MD5 (out3.zip) = 1a8ec423697ce9c657b6f1c12c51476f
~/C/t/test ❯❯❯ # Different zip file md5!
Digging into the source code for the zipping + uploading functionality you can see that the code walks the file tree and adds each file to the zipfile: https://github.com/aws/aws-cli/blob/384ae0aec97a706d1ff9ca9ce206dc93c9667038/awscli/customizations/cloudformation/artifact_exporter.py#L183-L196
My proposal would be that in this step it also md5
s all the files adding to the zip, and then finally md5
s the total. Not sure what the perf impact would be doing this, but it should make the final deployment significantly faster if doing this kind of thing.
I've tested locally on a lambda with a small 😛 sized node_modules
, total directory size ~20mb:
~/C/g/a/.s/Api ❯❯❯ time find . -type f -exec md5 \{\} >> ../out.md5 \;
10.51 real 3.18 user 6.76 sys
~/C/g/a/.s/Api ❯❯❯ md5 ../out.md5
MD5 (../out.mdf) = 6e6584c968e3974b60ba7b4e244a84b5
This was for 3098 files.
Yes, that's close to how it's done in λ# for the .NET zip packages. Make sure to sort the files by their full path first, then MD5 the file contents and the file path. If you omit the latter, the MD5 doesn't change when you change capitalization of a file!
@stealthycoin would there be any appetite for a PR implementing this?
@stealthycoin any update on this? I'd be happy to take a crack at a PR to implement the behaviour discussed.
hello guys, any updates please :) ? im facing the same issue, i have a multiple lambdas in monorepo once i update a lambda, the sam package generate multiples s3 zip files for the others even if i ddidnt any changes.. its a bug or feature request ?
Hi all, I've created a pull request which seems to solve the issue we were facing, where basically we compute the checksum on the entire function content (after installing all requirements) rather than computing it on the resulting ZIP file (the current behavior). The main difference is that when computing checksum on the ZIP it changes every time a file is created (it keeps into account file mtime and ctime) even if there is no actual change in the file content.
It would be great if this pull gets accepted and merged. Thanks. G
@gpiccinni I implemented a similar solution to yours in September here https://github.com/aws/aws-cli/pull/4526, but unfortunately nothing ever came of it.
@wmonk many thanks for pointing this out, by looking at your pull request I realized that in my case checksum is not changing when filenames change (which in my opinion should), whereas in your code you already addressed this !
I'll look into other libraries such as dirhash where the filename and path is included in the checksum and eventually change my pull request.
Thanks G
@gpiccinni , awesome !!! and thanks ! i hope that your PR can be merged quickly ! this can fix a lot of pipelines..,
That is exactly the issue. If timestamps on files are different in the zip file, even if they are the same contents, the md5 is different. In the case of scripting languages, this is probably not an issue. However with Go, each time you run
go build
, a new binary is created and thus a new timestamp.
@jmassara This problem exists for scripting languages also. I am facing same problem with Node.js lambdas. Looks like it is due to zip headers. Have a look at this stackoverflow discussion.
Well the CDK team does not have this problem? Find out what they are doing and do the same
After being frustrated at this issue for a while, i've fixed this in my own deploy scripts. Hopefully this can help some other, and maybe get some optimisations! I'm not sure if this is the "right" way to do it, but it's been working fine for us. One big benefit i've found is that I can make config changes without having to redeploy every function that relies on code (that hadn't changed).
find src -type f -exec md5sum {} \; > tmp-md5
find node_modules -type f -exec md5sum {} \; >> tmp-md5
CODE_MD5=$(md5sum tmp-md5 | cut -c 1-32)
if [ ! -f "$CODE_MD5" ]; then
zip -q -r $CODE_MD5 src node_modules # more files here
fi
aws s3 ls s3://bucket-name/$CODE_MD5 || aws s3 cp $CODE_MD5.zip s3://bucket-name/$CODE_MD5
sam deploy --parameter-overrides CodeUriKey=$CODE_MD5
Parameters:
CodeUriKey:
Type: String
NoEcho: true
Lambda:
Type: AWS::Serverless::Function
Properties:
CodeUri:
Bucket: bucket-name
Key: !Ref CodeUriKey
I have found another workaround (may be easier for those who have many lambda functions in one pipeline) to this issue.
Key to this workaround was to find out what contributes to different md5 of a zip even if contents of files within zip have not changed. I found 'Modified Timestamp' of files to be culprit. So idea is; if we can have consistent 'Modified Timestamp' on all files just before 'aws cloudformation package' or 'sam package' command is run, produced zip files will have consistent md5 across build executions.
find . -exec touch -m --date="2020-01-30" {} \; # date does not matter as long as it is never changed.
aws cloudformation package --template-file template.yml --s3-bucket <bucket> --output-template-file package-template.yml
Above trick has worked for me so far.
does not work for me
I have similar issue but with lambda layers instead. I have my template on codecommit and I created a codepipeline with a codebuild that will automate the cloudformation package and deploy process. However, everytime if there is any changes to the codecommit even the lambda layer did not change, it will still create a new lambda layer.
Anyone here has got any alternatives?
I have similar issue but with lambda layers instead. I have my template on codecommit and I created a codepipeline with a codebuild that will automate the cloudformation package and deploy process. However, everytime if there is any changes to the codecommit even the lambda layer did not change, it will still create a new lambda layer.
Anyone here has got any alternatives?
I have this same issue. While the suggestion from @rsodha does work to prevent most duplicate packages from being uploaded by the aws cloudformation package
command, the AWS::Serverless::LayerVersion
layer that I've created keeps getting re-uploaded, even when there are no package changes. I believe the reason is due to the CODEBUILD_SRC_DIR
path, which is different every time an AWS::CodeBuild::Project
is generated as part of my CodePipeline run. This CODEBUILD_SRC_DIR
path is saved inside the package.json
files that are created when I download the needed npm
packages for my Node Lambdas (but doesn't appear to be an issue for the Python packages). Because of this, the layer hash is always different and, therefore, gets re-uploaded every time.
If there were a way we could manually set the CODEBUILD_SRC_DIR
path to a static value every time the AWS::CodeBuild::Project
is generated in the CodePipeline's CloudFormation template, then that might be a solution to this issue.
I have similar issue but with lambda layers instead. I have my template on codecommit and I created a codepipeline with a codebuild that will automate the cloudformation package and deploy process. However, everytime if there is any changes to the codecommit even the lambda layer did not change, it will still create a new lambda layer. Anyone here has got any alternatives?
I have this same issue. While the suggestion from @rsodha does work to prevent most duplicate packages from being uploaded by the
aws cloudformation package
command, theAWS::Serverless::LayerVersion
layer that I've created keeps getting re-uploaded, even when there are no package changes. I believe the reason is due to theCODEBUILD_SRC_DIR
path, which is different every time anAWS::CodeBuild::Project
is generated as part of my CodePipeline run. ThisCODEBUILD_SRC_DIR
path is saved inside thepackage.json
files that are created when I download the needednpm
packages for my Node Lambdas (but doesn't appear to be an issue for the Python packages). Because of this, the layer hash is always different and, therefore, gets re-uploaded every time.If there were a way we could manually set the
CODEBUILD_SRC_DIR
path to a static value every time theAWS::CodeBuild::Project
is generated in the CodePipeline's CloudFormation template, then that might be a solution to this issue.
After many attempts, I still could not prevent a new Lambda Layer from being generated during each CodePipeline run. I tried the following:
buildspec.yaml
, for the CODEBUILD_SRC_DIR
variable and the path that is automatically generated in the AWS::CodeBuild::Project
resource, right from the start I renamed the the src...
path to the static value src123456789
and updated the CODEBUILD_SRC_DIR
variable accordingly. Unfortunately, this still resulted in a new Layer being created, even though it successfully provided a consistent source path between CodePipeline runs.sam pacakage
just in case there was a different between that and aws cloudformation package
, but this didn't make a difference either.I've downloaded a couple of Lambda Layer .zip files that didn't change between CodePipeline runs and checked their MD5 hash values and they are indeed different for some reason. The size of the files are different too (for example, 16,461,107 bytes vs. 16,461,114 bytes), but I can't figure out what the differences are between these two, as I've unzipped them and performed a directory comparison using the comparison tool Meld and it doesn't report any file differences.
So, I'm out of ideas as to why a new Lambda Layer is always generated and how to stop this from happening.
Any other ideas out there? Thanks.
@ryancabanas the file dates are probably different. Different values also means different compression level. I had to solve this problem for LambdaSharp.Net as well. You have to MD5 only the file paths and file contents in the ZIP file to make it an idempotent process.
@bjorg Thanks for helping! I am using the suggestion above from @rsodha and resetting the modified date for all the files, so they are consistent in that respect from build to build.
Any suggestions on how to go about determining what else could be different between the files from build to build? Thanks!
@ryancabanas not sure, but isn't there a modified and a created timestamp on files? Could that be it? Do folders have timestamps? Does the zip file itself have an internal timestamp?
I'd recommend you write a little app that opens both zips and compare the metadata of all entries. If the files are the same, it's must be the metadata. Most zip libraries are pretty easy to use. It's almost identical to comparing two folders. This might be frustrating, but so is guessing blindly.
Sorry I couldn't be of more assistance.
@bjorg Okay. I'll dig further in the ways you've mentioned. Thanks!
@ryancabanas did you try aws-cdk? It looks like it generates same hash for same contents each time.
CDK fanboy here. They don't have this problem, the cdk-assets does things like normalize file dates and line endings before zipping.
But @ryancabanas what you are describing the CODEBUILD_SRC_DIR
is different has an impact on the package.json. TL;DR It is the wild-wild west within the node_modules directory, it mutates after installation and is the cause for non-deterministic hashing.
Some packages embed the absolute path in the package.json after installation and then because CODEBUILD_SRC_DIR
is different, it forces that package.json to be different. I wrote about it here: https://www.rehanvdm.com/blog/cdk-shorts-1-consistent-asset-hashing-nodejs It is not actually a CDK or CFN problem but rather an NPM one.
The solution is to either remove the package.json from every node_module/ package so that when the hash is calculated, they are excluded. The better solution is to use bundling, a tool like ES Build treeshakes and bundles all your code into a single .js file. This is the only file in the zip then, so no package.json anywhere.
@rehanvdm Thanks for your article! Yes, what you said about the package.json
metadata, namely the CODEBUILD_SRC_DIR
path, is exactly what I discovered. I performed a test where, in CodeBuild, before anything else, I changed the src...
folder name to a consistent name (for example, I always change it to src123456789
) and this has resulted in .zip file contents that were then the same from build to build, but a new Lambda Layer is always uploaded still, even when it hasn't changed from build to build. I also used the suggestion above and changed the dates of all the files to a consistent date, but this hasn't solved the problem either.
I'm new to development and AWS, so I haven't used CDK before, or bundling. I will have to look into these. Thanks for the help!
Got it!
So I used the folder-hash package that @rehanvdm mentioned in his article and this helped reveal differences between my Lambda Layer assets. I had already taken care of the CODEBUILD_SRC_DIR
issue in the package.json
files for Node, but I'm also using a couple Python packages and it seems the .pyc
files differ in the __pycache__
folders from build to build. So after installing the packages, I have deleted these .pyc
files and now no more unnecessary Lambda Layers are being created and uploaded! Thanks for the help!
For me my issue was the I was creating the bundled zip using linux's zip command. I needed to use the -X option so it didn't add all the extra attributes to the created zip. I also included deleting the .pyc files and setting the last modified date for all the files to be the same in the solution so I'm not positive which combination of them is needed.
I've also encountered this issue when using CodeBuild to package a lambda functions and layers in a CloudFormation template.
As a workaround, the sam
cli does not seem to have this behaviour (anymore?), and is included in the aws/codebuild/standard:6.0
CodeBuild image. I was able to swap aws cloudformation package
for sam package
in CodeBuild buildspec to work around this issue.
I still have the same problem with aws cloudformation package
for Lambda functions that are pointing to a local .py
file. Even setting the mdate of my source files to a fixed date didn't help: touch -a -m -t"201001010000.00"
.
The generated zip file always has a different checksum.
What I would like to have is: When running cloudformation package, cloudformation deploy on the same source files then cloudformation must not re-deploy unchanged resources.
Are you able to implement that?
I have a Golang lambda with the following template:
Even when the code didn't change (Go build generates the same compiled code),
aws cloudformation package
command generates a new zip file.