CloudSnorkel / cdk-github-runners

CDK constructs for self-hosted GitHub Actions runners
https://constructs.dev/packages/@cloudsnorkel/cdk-github-runners/
Apache License 2.0
255 stars 37 forks source link

Error: Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version #540

Closed beeehappyandfree closed 2 weeks ago

beeehappyandfree commented 3 weeks ago

Hello @kichik ,

Getting the following error message. Any idea how to fix this ?

HelloCdkStack | 93/107 | 1:01:00 AM | CREATE_FAILED | Custom::ImageBuilder | runners/CodeBuild/Image Builder/Builder/Default (runnersCodeBuildImageBuilder5847475D) CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [bfeac84e-43c7-4ef8-bf43-56e92c8ee383]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version. HelloCdkStack | 93/107 | 1:01:00 AM | CREATE_FAILED | Custom::ImageBuilder | runners/Fargate/Image Builder/Builder/Default (runnersFargateImageBuilderB9E1498E) Resource creation cancelled

kichik commented 3 weeks ago

That custom resource is backed by a Lambda that triggers a CodeBuild project that builds the image and completes the custom resource. You should check the logs for each of those steps to see what happened. All of those steps have finally sections that make sure they always return a response to CloudFormation. So there is either a bug in one of those, an actual timeout because the image takes longer than an hour to build, or maybe some networking issue preventing the CodeBuild project from notifying CloudFormation.

If you are unable to easily locate the logs, you may want to try deploying with cdk deploy -R so the resources stay behind on error.

beeehappyandfree commented 3 weeks ago

It is a timeout. All the images were successfully build except for the fargate image. I will try again. Thanks.

kichik commented 3 weeks ago

Do you have a lot of added components? Is it possible the build actually took more than an hour?

kichik commented 3 weeks ago

We can extend the timeout to 12 hours with wait conditions. It will be a little cleaner too since the custom resource can be self contained in the lambda function.

https://aws.amazon.com/blogs/devops/implementing-long-running-deployments-with-aws-cloudformation-custom-resources-using-aws-step-functions/

kichik commented 3 weeks ago

Wait handles do work, but only once. Any update will not wait on the handle unless we create a whole new handle. To create a whole new handle, we need some stable value to put in its logical id. Previously we depended on buildspec.yml to make sure a new image is generated on update whenever anything is updated. But it contains some tokens that we can't hash in build time. So we would have to shrink that down to just the commands in the components. This is not terrible, but also not ideal. Any internal fixes or updates buildspec.yml will not trigger a build (unless we remember to force it somehow).

All this got me thinking... Do we always want CloudFormation to wait for the runner image to be built? Especially in cases where people add a ton of components and it takes a few hours? Maybe a parameter of "don't wait" will be more useful than wait handles. This may present an issues with Lambda images. We must have a real image digest during deployment time for those.

beeehappyandfree commented 2 weeks ago

Alrighty. I tried it again and it works. Nothing fancy and just a simple setup based on the youtube tutorial provided.