cdklabs / cdk-ecr-deployment

A CDK construct to deploy docker image to Amazon ECR
Apache License 2.0
160 stars 33 forks source link

Uploaded file must be a non-empty zip #478

Closed pharindoko closed 2 weeks ago

pharindoko commented 9 months ago

With the latest version 3.0.7 of the construct I do get the error when I try to deploy the stack

Deployment failed: Error: The stack named syncgroupsdevDeploymentStack0484A14B failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Resource handler returned message: "Uploaded file must be a non-empty zip (Service: Lambda, Status Code: 400,

Tested on Macos arm64 - Sonoma 14.2.1 colima with docker 24.07 cdk 2.118.0

ColinGilbert commented 8 months ago

I'm also getting that same error and solved the issue by downgrading the package to version 2.5.x.

It should be a high priority to fix this error.

SamStephens commented 8 months ago

@pharindoko @ColinGilbert what language are you running the CDK under?

pharindoko commented 8 months ago

Typescript

ColinGilbert commented 8 months ago

I'm using Typescript

SamStephens commented 8 months ago

I've been playing, and I see this error intermittently. I have a guess as to what is going on here.

The empty ZIP error is coming from Lambda when this stack attempts to upload the Lambda function that does the deployment.

My guess is that the functionality to download the prebuilt Lambda binary is sometimes failing, and rather than that failure being made visible, the build continues and uploads an empty ZIP.

@pharindoko @ColinGilbert can one of you please try the deployment that is failing with the environment variable NO_PREBUILT_LAMBDA set to 1? This means the Lambda is built locally rather than downloaded from Github.

SamStephens commented 8 months ago

Okay, I've confirmed this behavior myself. We can see that the assets for release 3.0.24 do not include the files bootstrap and bootstrap.sha256 that previous releases have. When I try and deploy using version 3.0.24, the deployment fails with the empty ZIP error. But when I set the environment variable NO_PREBUILT_LAMBDA to 1, the lambda builds locally successfully, and the deployment succeeds.

Release 3.0.7 does include the files bootstrap and bootstrap.sha256. So for those of you encountering issues with this version (and with any version other than 3.0.24), something is causing the CDK to have issues downloading from Github. The workaround is still the same, set the environment variable during your deployment.

SamStephens commented 8 months ago

So we have two issues here:

  1. That issues downloading bootstrap and bootstrap.sha256 from Github are not being surfaced correctly.
  2. That release 3.0.24 does not include bootstrap and bootstrap.sha256.

I suggest we use this issue to address download issues not failing the build correctly. I'll open a new issue for the release issue with 3.0.24.

SamStephens commented 8 months ago

As part of considering this task and https://github.com/cdklabs/cdk-ecr-deployment/issues/526, I think we should consider whether the prebuilt Lambda functionality is too complex and brittle, and whether it should just be ripped out entirely. Especially given the support status of this package (or lack there of). Building locally was pretty quick when I did it.

ColinGilbert commented 8 months ago

I can also confirm that using NO_PREBUILT_LAMBDA=1 works fine

fabiozig commented 8 months ago

I also had the same problem but i don't understand how the NO_PREBUILT_LAMBDA env solves the problem if the current code already build the docker image in case the download fails:

My logs when the download fail:

HTTPError: Response code 404 (Not Found)
Can not get prebuilt lambda: Error: Command failed: node /azp/_work/1/s/server/node_modules/cdk-ecr-deployment/lambda/install.js /azp/_work/1/s/server/node_modules/cdk-ecr-deployment/lambda/out
HTTPError: Response code 404 (Not Found)

#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 451B done
#1 DONE 0.1s

#2 [internal] load .dockerignore
#2 transferring context: 140B done
#2 DONE 0.1s

#3 [internal] load metadata for docker.io/library/golang:1
#3 DONE 2.4s
...

Regardless of the image being built the lambda deployment fails with the empty message...

fabiozig commented 8 months ago

I can also confirm that by downgrading and fixing the version to 3.0.23 i was able to download the prebuilt lambda without problems

fabiozig commented 8 months ago

I found the problem and it can be reproduced after the second deployment, after your bootstrap file has been created.

1 - download your dependencies (node_modules) using the version 3.0.24 of cdk-ecr-deployment 2 - deploy your cdk application 3 - check that the bin file is created on node_modules/cdk-ecr-deployment/lambda/out/bootstrap 4 - redeploy your application 5 - deployment will fail with the following error: "Uploaded file must be a non-empty zip"

It seems the second deployment fails because it thinks it already has downloaded the bootstrap from github, even though actually the bootstrap file was created with cdk building the docker image for our lambda function. The problem is that the file created when deploying the docker file is actually empty, that's why the non-empty zip error happens

Specifically the bug is a mixture of the library failing to download the bootstrap files and the file generated by cdk after docker build deployment being empty. You can see the bug in the install file here: https://github.com/cdklabs/cdk-ecr-deployment/blob/main/lambda/install.js#L50

bin path is set here: https://github.com/cdklabs/cdk-ecr-deployment/blob/main/src/index.ts#L102

A simple solution would be adding an empty file check here for this specific case. Or maybe the download function is creating the file even though the download fails. In this case we should delete the file in the catch logic.

@SamStephens what do you think?

SamStephens commented 8 months ago

@fabiozig good piece of analysis. I was seeing intermittent failures as I tried to track this down; they baffled me, so well done finding the failure scenario.

I've been wondering how this bug could have been introduced by the 3.x changes, and after reading your analysis I think it probably wasn't; that this bug has probably been present for a long time, but we're only just seeing it now because the missing prebuilt binaries in 3.0.24 mean that is the first time the failure scenario you describe has been encountered (the lambda is built locally, and then a second deployment is tried).

What is confusing me is why it makes a difference that node_modules/cdk-ecr-deployment/lambda/out/bootstrap was built locally rather than downloaded from Github. Regardless of how the binary is obtained, it should be exactly the same binary, in exactly the same location, and the docker image should not be empty regardless.

fabiozig commented 8 months ago

The file is not created when the docker image is built but when we try to download the bootstrap artifacts in the pipeline code below. That's why when using NO_PREBUILT_LAMBDA we had no problems, the download process is skipped altogether and the bootstrap file is not created.

The problem is that currently whether the download is successful or not, the bootstrap file is being created. https://github.com/cdklabs/cdk-ecr-deployment/blob/main/lambda/install.js#L33

I checked the documentation on the got library and it seems like the errors on the stream should be treated separately https://github.com/sindresorhus/got/blob/HEAD/documentation/3-streams.md#response

You are right the bug was not introduced with the 3.x changes. But thinking about a general case, if the download fails because of any reason the file should not be created.

Also, it seems version 3.0.24 was fixed and now the artifacts are uploaded normally, so this problem should not happen again for now.

SamStephens commented 8 months ago

The file is not created when the docker image is built but when we try to download the bootstrap artifacts in the pipeline code below.

Well spotted, my mistake.

But thinking about a general case, if the download fails because of any reason the file should not be created.

Agreed. Looking a little at Stackoverflow, it looks like normal practice is to unlink the destination file if the download fails. But I've only really used Javascript in the browser and my knowledge of NodeJS I/O is minimal, so I wouldn't really know.

I still think the best path forward is to remove the prebuilt functionality entirely, to simplify this package due to lack of support.

RanbirAulakh commented 4 months ago

It seems like this problem came back in 3.0.58 -- I am facing this same issue

resource handler returned message: "Uploaded file must be a non-empty zip (Service: Lambda, Status Code: 400, HandlerErrorCode: InvalidRequest)
mikedescalzo commented 4 months ago

Can confirm seeing this error in version 3.0.67

On codepipeline synth:

Collecting cdk-ecr-deployment (from -r requirements.txt (line 9))
  Downloading cdk_ecr_deployment-3.0.67-py3-none-any.whl.metadata (5.0 kB)
...
Can not get prebuilt lambda: Error: Command failed: /usr/local/bin/node /tmp/jsii-kernel-466bYE/node_modules/cdk-ecr-deployment/lambda/install.js /tmp/jsii-kernel-466bYE/node_modules/cdk-ecr-deployment/lambda/out

Then while creating the lambda:

"errorMessage": "Uploaded file must be a non-empty zip",

edit: Issue is fixed by setting 'cdk-ecr-deployment==3.0.23' as recommended above

RanbirAulakh commented 4 months ago

@mikedescalzo If you pin cdk-ecr-deployment as 3.0.66, it fixes the issue for me!

bruceduhamel commented 3 months ago

I confirmed this issue today in cdk-ecr-deployment==3.0.75 using python 3.11 but was working with 3.0.71

dennmee commented 1 month ago

Having same issue with the latest version 3.0.100 , pinning to the previous one worked.

mzha999 commented 1 month ago

Hi @dennmee, could you please let me know which version you’re currently using?

dennmee commented 1 month ago

Hi @dennmee, could you please let me know which version you’re currently using?

We pinned it to 3.0.96 for now.

SamStephens commented 1 month ago

@mzha999 are you a maintainer of this package? If so, what do you think about removing the prebuilt Lambda functionality entirely, so the lambda is always built locally? The prebuild functionality keeps failing and causing issues, as we can see from this issue.

I don't think the prebuild gains us enough to be worth maintaining, especially considering how little development work goes into this package. Docker should cache the build, so not prebuilding shouldn't actually cost developers much time. I think we're better to remove this functionality and simplify the package. We're better off having the code focus on actual function, not build optimisations we do not really need.

jchong18 commented 1 month ago

Having the same issue here with 3.0.100. Changing to version 3.0.96 fixed the issue.

athewsey commented 1 month ago

Seeing same on 3.0.100 here too (aws-cdk-lib==2.140.0)

Even rolling back to 3.0.99 seemed to fix the deployment for us

ooiyeefei commented 1 month ago

faced the same issue when i set "cdk-ecr-deployment": "^3.0.109" but resolved with pinning it at 3.0.66 as mentioned by @RanbirAulakh !

mrgrain commented 2 weeks ago

3.0.100 to 3.0.109 are missing the pre-built lambda. Later versions should work. We are planning on replacing the current approach with something else.