Closed RickCogley closed 3 years ago
I was about to make the same post.
I even tried a fresh app from the README, it deploys the first time but after that I can't do any more deployments. I can't figure out why it would suddenly stop working,
~/Workspace/my-app$ up
build: 5 files, 6.8 MB (678ms)
deploy: staging (version 1) (24.576s)
stack: complete (20.248s)
endpoint: https://bawza8mlwc.execute-api.eu-west-1.amazonaws.com/staging/
~/Workspace/my-app$ up deploy -v staging
4ms DEBU up version 1.7.0-pro (os: linux, arch: amd64)
0s DEBU inferred runtime type=node
⠋ 0s DEBU 1 regions from config
4.329s DEBU 1 regions from config
0s DEBU event deploy map[commit: stage:staging]
0s DEBU event platform.build map[]
0s DEBU hook prebuild is not defined
0s DEBU event hook map[hook:[] name:build]
1ms DEBU hook "build" command ""
0s DEBU event hook.complete map[duration:1.521584ms hook:[] name:build]
0s DEBU injecting proxy
237ms DEBU loading env vars
166ms DEBU loaded env vars duration=237
0s DEBU open
0s DEBU filtered .git – 4096
0s DEBU add _proxy.js: size=3609 mode=-rwxr-xr-x
0s DEBU add app.js: size=100 mode=-rwxrwxr-x
259ms DEBU add main: size=13813177 mode=-rwxrwxr-x
1ms DEBU add up-env.json: size=2 mode=-rwxr-xr-x
0s DEBU add up.json: size=86 mode=-rwxr-xr-x
0s DEBU stats dirs_filtered=1 files_added=5 files_filtered=0 size_uncompressed=14 MB
14ms DEBU close
0s DEBU event platform.build.zip map[duration:677.298639ms files:5 size_compressed:6827405 size_uncompressed:13816974]
5ms DEBU removing proxy
0s DEBU hook postbuild is not defined
0s DEBU event platform.build.complete map[duration:683.539072ms]
0s DEBU hook predeploy is not defined
0s DEBU hook deploy is not defined
4.528s DEBU checking for role
0s DEBU found existing role
337ms DEBU updating role policy
0s DEBU set role to arn:aws:iam::***:role/my-app-function
0s DEBU event platform.deploy map[commit: region:eu-west-1 stage:staging]
4.504s DEBU fetching function config region=eu-west-1
5.574s DEBU ensuring s3 bucket exists name=up-***-eu-west-1
6.05s DEBU uploading function to bucket up-***-eu-west-1 key my-app/staging/1631861611-ndAEJ3t5oTlWTDEw.zip
288ms DEBU updating function
319ms DEBU updating function code
0s DEBU event platform.function.update map[commit: region:eu-west-1 stage:staging]
0s DEBU event platform.deploy.complete map[commit: duration:16.735614612s region:eu-west-1 stage:staging version:]
DEBU event platform.deploy.complete map[commit: duration:16.735614612s region:eu-west-1 stage:staging version:]
⠦ 0s DEBU event deploy.complete map[commit: duration:22.285139868s stage:staging]
Error: deploying: eu-west-1: updating function code: ResourceConflictException: The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:eu-west-1:***:function:my-app
{
RespMetadata: {
StatusCode: 409,
RequestID: "cab54633-8476-459b-ba54-cad7362b8dd5"
},
Message_: "The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:eu-west-1:***:function:my-app",
Type: "User"
}
It's possible that destroying the stack each time will let you deploy, but that means a several minutes where there is no website.
@t1bb4r Did you try the temporary fix of putting aws:states:opt-out
in the description? That fixed it for me, but will only work thru 1st Oct.
@RickCogley That worked for me, thanks a lot!
Sure thing @t1bb4r. I tried this with a couple more sites and I'm getting the same error consistently, with different up setups.
In the article that Ben found, it mentions that lambda permissions can be added to a service role being used by CloudFormation (see "Updating CloudFormation’s service role" section on https://aws.amazon.com/blogs/compute/coming-soon-expansion-of-aws-lambda-states-to-all-functions/). I know that up
uses CloudFormation, but, I am not sure if or how it's using a service role, or how to prove it either way. I did try adding Lambda:GetFunction (https://docs.aws.amazon.com/lambda/latest/dg/API_GetFunction.html) to the IAMs user you make for up
to use, but it did not make a difference.
We are also experiencing this issue
Hey guys sorry for the delay, taking a look at this. I read the announcement post but I'm a bit confused how it would influence Up, the recommended policy for running Up (https://apex.sh/docs/up/credentials/#iam_policy_for_up_cli) already has `lambda:Get*.
It sounds a bit like simply updating the SDK will work, I'll try that today and update here (and push a release if it's fine).
I'm not having any luck reproducing it actually, I'm still able to deploy my apps with 1.7.0-pro and I tried doing a few fresh application stacks as well. Are you guys seeing any particular pattern or is it across all of your apps?
I'm having it across any existing up apps ... if i create a new stack (destroy and create an existing) it will work.
The way that i've hacked around this is running:
aws lambda update-function-configuration --function-name $(node -p "require('./up.json').name") --description "aws:states:opt-out"
Before an up deploy
which makes sure that the lambda description is updated to "aws:states:opt-out" for existing lambda functions
Which was defined in the article
Thanks for looking into it @tj. I had tried it on a few sites which were built on AWS "sub-organizations" underneath our master account. (not sure what they are really called) All of those failed with the error, and each of their IAM users does have the right permissions, it appears.
I just tried it on one on our master account, and it succeeded. So I tried another one on our master, and that failed.
FYI
Not sure if it makes any difference, but the apps I am deploying are just static sites, either hand coded HTML files and a few assets in a "html" folder, or, Hugo generated into its usual "public" folder.
We started experiencing this issue today as well. This workaround allowed us to do deployments though:
you can add aws:states:opt-out as the lambda description, to bypass the problem, but it's reportedly going to stop working as of 1st Oct 2021.
I was reading in the docs that it actually recommends:
If a function is stuck in the Pending state for more than six minutes, call one of the following API operations to unblock it:
So it seems like they actually anticipate being in a stuck state which is a bit odd, it’s like they’re admitting it’s broken. Do you guys use it in a VPC? Mine aren’t in a VPC, that could explain why I’m not really seeing it.
There might not be anything I can really do there, I wish any new deploy would simply override the previous, but it looks like that’s not really how they wrote the system.
hi @tj as for us, no, we're not using it in a VPC.
I was about to make the same post.
I even tried a fresh app from the README, it deploys the first time but after that I can't do any more deployments. I can't figure out why it would suddenly stop working,
~/Workspace/my-app$ up build: 5 files, 6.8 MB (678ms) deploy: staging (version 1) (24.576s) stack: complete (20.248s) endpoint: https://bawza8mlwc.execute-api.eu-west-1.amazonaws.com/staging/
~/Workspace/my-app$ up deploy -v staging 4ms DEBU up version 1.7.0-pro (os: linux, arch: amd64) 0s DEBU inferred runtime type=node ⠋ 0s DEBU 1 regions from config 4.329s DEBU 1 regions from config 0s DEBU event deploy map[commit: stage:staging] 0s DEBU event platform.build map[] 0s DEBU hook prebuild is not defined 0s DEBU event hook map[hook:[] name:build] 1ms DEBU hook "build" command "" 0s DEBU event hook.complete map[duration:1.521584ms hook:[] name:build] 0s DEBU injecting proxy 237ms DEBU loading env vars 166ms DEBU loaded env vars duration=237 0s DEBU open 0s DEBU filtered .git – 4096 0s DEBU add _proxy.js: size=3609 mode=-rwxr-xr-x 0s DEBU add app.js: size=100 mode=-rwxrwxr-x 259ms DEBU add main: size=13813177 mode=-rwxrwxr-x 1ms DEBU add up-env.json: size=2 mode=-rwxr-xr-x 0s DEBU add up.json: size=86 mode=-rwxr-xr-x 0s DEBU stats dirs_filtered=1 files_added=5 files_filtered=0 size_uncompressed=14 MB 14ms DEBU close 0s DEBU event platform.build.zip map[duration:677.298639ms files:5 size_compressed:6827405 size_uncompressed:13816974] 5ms DEBU removing proxy 0s DEBU hook postbuild is not defined 0s DEBU event platform.build.complete map[duration:683.539072ms] 0s DEBU hook predeploy is not defined 0s DEBU hook deploy is not defined 4.528s DEBU checking for role 0s DEBU found existing role 337ms DEBU updating role policy 0s DEBU set role to arn:aws:iam::***:role/my-app-function 0s DEBU event platform.deploy map[commit: region:eu-west-1 stage:staging] 4.504s DEBU fetching function config region=eu-west-1 5.574s DEBU ensuring s3 bucket exists name=up-***-eu-west-1 6.05s DEBU uploading function to bucket up-***-eu-west-1 key my-app/staging/1631861611-ndAEJ3t5oTlWTDEw.zip 288ms DEBU updating function 319ms DEBU updating function code 0s DEBU event platform.function.update map[commit: region:eu-west-1 stage:staging] 0s DEBU event platform.deploy.complete map[commit: duration:16.735614612s region:eu-west-1 stage:staging version:] DEBU event platform.deploy.complete map[commit: duration:16.735614612s region:eu-west-1 stage:staging version:] ⠦ 0s DEBU event deploy.complete map[commit: duration:22.285139868s stage:staging] Error: deploying: eu-west-1: updating function code: ResourceConflictException: The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:eu-west-1:***:function:my-app { RespMetadata: { StatusCode: 409, RequestID: "cab54633-8476-459b-ba54-cad7362b8dd5" }, Message_: "The operation cannot be performed at this time. An update is in progress for resource: arn:aws:lambda:eu-west-1:***:function:my-app", Type: "User" }
I created an app 5 days ago from the README and was experiencing this issue. It's now working. No changes to the lambda description, aws account, up version or app code and its just working.
I deployed a few times (5 days ago this was a 100% failure):
~/Workspace/my-app$ up deploy staging
build: 5 files, 6.8 MB (861ms)
deploy: staging (version 3) (12.105s)
endpoint: https://bawza8mlwc.execute-api.eu-west-1.amazonaws.com/staging/
~/Workspace/my-app$ up deploy staging
build: 5 files, 6.8 MB (958ms)
⠧ deploy: staging
⠦ deploy: staging
⠋ deploy: staging
⠼ deploy: staging
deploy: staging (version 4) (13.61s)
endpoint: https://bawza8mlwc.execute-api.eu-west-1.amazonaws.com/staging/
~/Workspace/my-app$ up deploy staging
build: 5 files, 6.8 MB (861ms)
deploy: staging (version 5) (13.856s)
endpoint: https://bawza8mlwc.execute-api.eu-west-1.amazonaws.com/staging/
~/Workspace/my-app$ up deploy staging
build: 5 files, 6.8 MB (825ms)
deploy: staging (version 6) (11.657s)
endpoint: https://bawza8mlwc.execute-api.eu-west-1.amazonaws.com/staging/
~/Workspace/my-app$ up deploy staging
build: 5 files, 6.8 MB (923ms)
deploy: staging (version 7) (14.029s)
endpoint: https://bawza8mlwc.execute-api.eu-west-1.amazonaws.com/staging/
~/Workspace/my-app$ up deploy staging
build: 5 files, 6.8 MB (865ms)
deploy: staging (version 8) (10.654s)
endpoint: https://bawza8mlwc.execute-api.eu-west-1.amazonaws.com/staging/
The only conclusion that I can make is that AWS made some changes to cause this, but then fixed it again. Is anyone still experiencing this issue right now?
I've got a hugo site that consistently works, and a static site in an "html" folder that consistently fails. Just re-confirmed that neither site has the add aws:states:opt-out as the lambda description
workaround set. Both sites are using a github action to deploy, which was working fine before this problem reared its head, and I'm getting the same error running up
locally as well.
The only real difference between the settings is that the (succeeding) hugo site has setup and build steps whereas the (failing) html site is just a literal file copy. There was a "endpoint:regional" setting in the up.json
in the failing static site, which I removed (https://github.com/RickCogley/cogley.info/commit/0a6256e83d087fa23ab7388977d589aab3c7f566) but this made no difference; a re-run still failed.
In AWS console, lambda page for the failing static site:
This comment https://github.com/claudiajs/claudia/issues/226#issuecomment-921883467 mentions that they are using terraform and updated a version ...
It's a hail mary (as is the above sequence of voodoo majick testing) but @tj, as you mentioned maybe a recompile would actually help? Who knows...
Ok, found something else @tj: this forum post https://forums.aws.amazon.com/thread.jspa?messageID=995863&tstart=0 says you "need to put a check for the function state in between the update_function_code and the publish version calls. Make sure the state is active before proceeding https://docs.aws.amazon.com/lambda/latest/dg/functions-states.html"
And, someone else mentions: "I also noticed that the ci/cd tool is using an old version of the AWS SDK (1.11.834), and if I deploy the code using AWS CLI (2.2.37) it works. Could this be related?"
@RickCogley ahhh interesting, that sounds like a reasonable fix. I guess there's always room for a race condition after doing the request for the status as well since it's not atomic, but if we can assume it's deploying in a CI or just one person at a time it should be ok.
I guess in that case we'd just have to keep polling until it's done, which sounds like it can be several minutes according to the docs. I'll try and get that in on Monday, I still couldn't reproduce that state but I'll make sure they deploy normally and hopefully that'll fix it in your cases
we are also seeing this issue. setting aws:states:opt-out
as the function description seems to have gotten us going again but its definitely a temporary fix that will break once AWS decides to force lifecycles on everyone
Thanks @tj !
yikes so I guess you need to poll/wait before UpdateFunctionCode, UpdateFunctionConfiguration, and PublishVersion by the looks of it haha.. good old AWS, making things slow and difficult. I'll have to add some reasonable limit for now when it comes to the wait so it doesn't hang forever, but ideally it's configurable
re-opening until you guys can confirm the fix since I can't reproduce it. It'll take about 20m to get the releases built/uploaded. I guess the worst-case is some of them are actually getting stuck in that pending state
Ok if you up upgrade
you should get v1.7.1-pro now with 0b09440, and if you run with -v
you should see a bunch of logs mentioning checking and waiting for the state to change, curious to know how long it's actually stuck in a pending state if that is what's going on
Confirmed I get the latest version and it works on the site that was failing. Thanks!
Edit: I mean I got the latest version automatically when deploying via GH actions. Also, running up upgrade
from my $HOME
upgraded showing a progress bar, then gave a message "Updated 1.7.0 Pro to 1.7.1 Pro".
@tj trying to run up
in up.json
with a -v
to get more verbose logs. Is there a way to specify switches?
...
- name: Deploy via Apex Up
env:
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
UP_CONFIG: ${{ secrets.UP_CONFIG }}
uses: apex/actions/up@v0.5.1
with:
stage: production
- name: Check folder contents
run: |
ls
echo "====== PUBLIC ======"
ls public
ok, ran up -v production
successfully on another was-failing site, from local. The relevant part of the log:
⠹ 3.495s DEBU uploading function to bucket up-27999195-ap-northeast-1 key rickcogley-logr/production/1699939-ci8oFg9iVx768 ⠸ 3.598s DEBU uploading function to bucket up-279990195-ap-northeast-1 key rickcogley-logr/production/1999-ci8999Vx768 3.68s DEBU uploading function to bucket up-2799900195-ap-northeast-1 key rickcogley-logr/production/1639990239-ci8oFg9iVx768Z2t.zip
0s DEBU updating function
43ms DEBU checking if function is pending (attempt 1 of 30)
156ms DEBU function is in state "Active" / "Successful"
0s DEBU updating function code
47ms DEBU checking if function is pending (attempt 1 of 30)
5.004s DEBU function is in state "Active" / "InProgress", trying again in 5s
48ms DEBU checking if function is pending (attempt 2 of 30)
5.846s DEBU function is in state "Active" / "Successful"
44ms DEBU alias production to 91
45ms DEBU alias production-previous to 90
65ms DEBU alias commit-d752006 to 91
29ms DEBU alias production-previous to 90
0s DEBU event platform.function.update map[commit:d752006 region:ap-northeast-1 stage:production]
⠇ 0s DEBU event platform.deploy.complete map[commit:d752006 duration:16.035353527s region:ap-northeast-1 stage:production ver ⠇ 88ms DEBU event platform.deploy.complete map[commit:d752006 duration:16.035353527s region:ap-northeast-1 stage:production ver 124ms DEBU event platform.deploy.complete map[commit:d752006 duration:16.035353527s region:ap-northeast-1 stage:production version:91]
0s DEBU event platform.deploy.url map[url:https://q24o3id8m2.execute-api.ap-northeast-1.amazonaws.com/production/]
0s DEBU hook postdeploy is not defined
0s DEBU event hook map[hook:[up -v prune -s production -r 10] name:clean]
5.05s DEBU hook "clean" command "up -v prune -s production -r 10"
0s DEBU event hook.complete map[duration:5.050488581s hook:[up -v prune -s production -r 10] name:clean]
0s DEBU event deploy.complete map[commit:d752006 duration:30.677725571s stage:production]
⠏ 0s DEBU track "Deploy" map[actions_count:0 alerts_count:0 app_name_hash:91ffc84307999a4a47a171ee29 arch:amd64 ci:false dns_zone_count:0 duration:30964 environment_count:0 has_cors:false has_error_pages:true has_logs:true has_profile:true header_rules_count:1 inject_rules_count:0 is_git:true lambda_accelerate:false lambda_memory:1024 os:darwin plan:pro proxy_timeout:15 redirect_rul 0s DEBU track "Deploy" map[actions_count:0 alerts_count:0 app_name_hash:91ffc84307999f162a4a47a171ee29 arch:amd64 ci:false dns_zone_count:0 duration:30964 environment_count:0 has_cors:false has_error_pages:true has_logs:true has_profile:true header_rules_count:1 inject_rules_count:0 is_git:true lambda_accelerate:false lambda_memory:1024 os:darwin plan:pro proxy_timeout:15 redirect_rules_count:0 regions:[ap-northeast-1] stage:production stage_count:3 stage_domain_count:2 type:static version:1.7.1-pro]
⠼ 515ms DEBU flushing analytics
536ms DEBU flushing analytics
⠴ 0s DEBU flushing analyticsuser=5.59s system=1.82s cpu=23% total=31.621
Hth
It's fixed for me as well. Here's my log:
awesome thanks guys! I'll close for now 😄
Hi! Even using "aws:states:opt-out" I have got problems. Does anyone have any ideas?
Did you upgrade per the above?
Yeah! This was already updated since when aws recommended it, it was working, but yesterday it didn't!
Hi guys, same thing here, all in the last version and the same error, and even putting the flags in the optional fields.
Prerequisites
up upgrade
)-v, --verbose
flag.Description
Please see: https://apex-dev.slack.com/archives/C65P0GAV8/p1631749067003000 ... and: https://aws.amazon.com/blogs/compute/coming-soon-expansion-of-aws-lambda-states-to-all-functions/
I have this issue too, but it was first reported by Ben Nichols on the Slack
#up
channel.Whether via CLI
up staging
andup production
, or, via in my case Github Actions, you get an error like:... and the deployment fails.
Steps to Reproduce
Make a visible change in one of your branches and do
up staging
orup production
as appropriate, orgit push
to the branch and have your CI run it. Either way, you get an error like the above.As Ben Nichols mentioned, you can add
aws:states:opt-out
as the lambda description, to bypass the problem, but it's reportedly going to stop working as of 1st Oct 2021.This feels like something other up users are suddenly going to experience, so it's my hope that someone can figure out how to change the code to fix this problem urgently.
Slack
Join us on Slack https://chat.apex.sh/