aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
https://aws.github.io/copilot-cli/
Apache License 2.0
3.48k stars 400 forks source link

Create TaskDefinition: Container.image repository should not be null or empty #3455

Closed johnAirRobe closed 2 years ago

johnAirRobe commented 2 years ago

I'm creating two different environments in the same AWS account and wanting to create a service in this new environment that is creating an image from docker files that are being used in another env. However, when I deploy I'm getting this error Create TaskDefinition: Container.image repository should not be null or empty.

I was a bit confounded as to why it was giving me this error. However, when I looked into the parameters in the new environment compared to the preexisting environment I noticed this difference:

Preexisting: ContainerImage: 601068425913.dkr.ecr.us-east-1.amazonaws.com/connector/worker@sha256:e9d51e6250e3454ab51d0468d47b5bfae55985552d1e46d2d149ae33f5e5259c

New: ContainerImage: @sha256:e206fc32e605ee6b46502a7ae3be0df4ed10b76eba96c98d8c64ae762c2c71b0

So this error message makes sense to me now, but it begs the question. Why doesn't the new environment contain the correct ContainerImage value?

I've got a manifest for two different services that are being used in both the preexisting and new env.

Another thing I noticed is that in the second service to the new env it doesn't have this issue and it the ContainerImage value contains the full value as per the preexisting env for the other service above.

I'm having another issue with deploying this service related to the service being able to access the secrets in SSM. I'm not sure if these issues are related?

I'm not sure what you'll need to help debug this issue.

This is the command I used to setup the new environment in the same account as the preexisting environment: AWS_PROFILE=stg AWS_REGION=us-east-1 copilot env init

I set it up without any issue.

huanjani commented 2 years ago

Hello, @johnAirRobe! A couple clarifying questions while I am trying to replicate what you're seeing:

huanjani commented 2 years ago

I'm stumped by your partial ContainerImage value! Would you mind pasting your manifests here? Thanks!

johnAirRobe commented 2 years ago

@huanjani Thanks for the fast response! :)

RE: Manifests - We have two different manifests for two different services (web and worker) and within each of those manifests I have 3 different environments. However, two of those environments (staging and sandbox) are deploying to the same account while another (production) is deploying to a different account.

I cannot deploy the worker b/c of the issue with the truncated ContainerImage value and the web service starts to deploy but fails because of the secrets issue.

RE: Secrets access - This is the command I used when creating the secrets: AWS_PROFILE=stg AWS_REGION=us-east-1 copilot secret init -a connector --cli-input-yaml sandbox.secrets.yml and this is the error I get when deploying the web service ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secrets from ssm: service call has been retried 1 time(s): AccessDeniedException: User: arn:aws:sts::601068425913:assumed-role...

However, a former colleague, who set this infra up, used this command to create the secrets export ENVIRONMENT_NAME=test APP_NAME=connector && \ aws ssm put-parameter --name GH_WEBHOOK_SECRET --value secretvalue1234 --type SecureString \ --tags Key=copilot-environment,Value=${ENVIRONMENT_NAME} Key=copilot-application,Value=${APP_NAME}

And, yeah, I'm referencing the secrets in the manifest, as you can see, like all the other environments. The weird thing is that when I originally provisions the new environment and try to deploy it I got that error about the secrets and then I changed the copilot-environment value for one secret in stg to sandbox and deployed with that one secret and it successfully deployed.

I did make some changes to the manifest after that but it's not clear what changes would've caused this issue now. I also tried to do the same thing, change the copilot-environment value, with the current manifest and when I deployed I got that same error.

Not sure if this will help but this is the user that modifies the stg secrets:

And this is the user that modifies the sandbox secrets:

Here's a gist containing the web and work manifest. The are both named manifest.yml, I had to change to give them a unique name to create the gist. They are in the copilot/web and copilot/worker directory.

iamhopaul123 commented 2 years ago

Hello @johnAirRobe. Ideally all Copilot commands should be run with the same profile. Although some commands like env init we'll ask which profile you'd like to use, the profile to run the command itself has to be consistent as the one to create the whole application.

The same rule applies with secret init. I see you've run

AWS_PROFILE=stg AWS_REGION=us-east-1 copilot secret init -a connector --cli-input-yaml sandbox.secrets.yml

Did you create your application using the same profile in the same region (aka us-east-1)? Thank you!

johnAirRobe commented 2 years ago

Hey @iamhopaul123.

Specifically what do you mean by same profile? Do you mean they have to be the same user or role or policies that created the application?

There are potentially two original profiles that created the application but they are both IAM users in our staging account. One of which is being used to deploy from copilot in our CI environment which has permissions that are more restrictive than Admin. The profile I'm using atm to create the new environments is an IAM user in the same staging account. We both have the same same Admin group attached to our users.

Whenever I run a command in copilot I explicitly set the AWS_PROFILE and AWS_REGION to the appropriate values. The former changes but the latter always stays as us-east-1 as that's where we have deployed all our infra.

The command to set up secrets from the former colleague has the ENVIRONMENT_NAME set as test but in practice it's stg.

RE: creation of application. I didn't create the app myself using my current profile. A former colleague created the app and the preexisting environments. I'm not 100% sure what profile they used in our staging account. But I'm 100% it was in the us-east-1 region. I created a new environment with my profile that is a user in our staging account in the us-east-1 region.

Is there anything else I can do on my end to help with debugging?

Do you think it'd make any difference if I use the user credentials that deploys in our CI to create the new env?

iamhopaul123 commented 2 years ago

Specifically what do you mean by same profile? Do you mean they have to be the same user or role or policies that created the application?

Sorry for the confusion. I was about to say for all the init commands for the same application, ideally you would need to use the same profile (AWS_PROFILE) to make sure having required permissions to create everything. And for the other commands you don't have to. However, you would still need to use the same app account because all the the app resources stay there.

There are potentially two original profiles that created the application but they are both IAM users in our staging account. One of which is being used to deploy from copilot in our CI environment which has permissions that are more restrictive than Admin.

I think this workflow is correct and makes sense to me!

Preexisting: ContainerImage: 601068425913.dkr.ecr.us-east-1.amazonaws.com/connector/worker@sha256:e9d51e6250e3454ab51d0468d47b5bfae55985552d1e46d2d149ae33f5e5259c New: ContainerImage: @sha256:e206fc32e605ee6b46502a7ae3be0df4ed10b76eba96c98d8c64ae762c2c71b0

Going back to this image URL truncating issue. I wonder if it is because the new service somehow was not added to the app when created, because if you check the output of your app stack instance (stack name in format of StackSet-demo-infrastructure-275ff34b-c581-4285-a4e7-fc4cff84530f), it is very likely that in your new environment region, it only has ECRRepoweb, but not ECRRepoworker. Could you try deleting the new service using svc delete to delete the problematic worker service and then create it again?

Also just to double check, you said the new service can be deployed to the preexisting environment. Are your new environment in different region from the preexisting env?

johnAirRobe commented 2 years ago

Sorry for the confusion. I was about to say for all the init commands for the same application, ideally you would need to use the same profile (AWS_PROFILE) to make sure having required permissions to create everything. And for the other commands you don't have to. However, you would still need to use the same app account because all the the app resources stay there.

No worries! The new environment and services will be in the same account as the app and I'm using a profile that has admin permissions for this account.

Going back to this image URL truncating issue. I wonder if it is because the new service somehow was not added to the app when created, because if you check the output of your app stack instance (stack name in format of StackSet-demo-infrastructure-275ff34b-c581-4285-a4e7-fc4cff84530f), it is very likely that in your new environment region, it only has ECRRepoweb, but not ECRRepoworker. Could you try deleting the new service using svc delete to delete the problematic worker service and then create it again?

I'm looking in the connector-sandbox Outputs but I cannot see ECRRepoweb nor ECRRepoworker, is this unexpected?

I've deleted the worker service from the sandbox environment, but it's not clear to me how to create a new service in that environment. The svc init command doesn't have a flag to select which environment I want to add it to. If I run this command

copilot init --app connector          \
      --name worker                       \
      --type 'Backend Service'            \
      --dockerfile './Dockerfile.worker'

Will this create a service for both the prod, stg, sandbox env? Is there any way to only create a service for sandbox?

When I ran the above command to create a new service a got an error letting me know that service already exists.

It's not clear to me whether I need to create a new service for each new environment or if I just need to deploy the already created service to that new environment. I'm wondering if that's where I'm going wrong because I'm deploying a service to the new environment and I keep getting the Container.image repository error (I got that just now when I deleted the new worker service and deployed it again.

Could I get clarity on whether I need to create a new service for each environment and then deploy to that service? If so how do I choose which environment to create the service in?

Also just to double check, you said the new service can be deployed to the preexisting environment. Are your new environment in different region from the preexisting env?

So it's not a new service per se, but the same service being deployed into a new environment and, no, the new environment is in the same region as the preexisting stg environment.

johnAirRobe commented 2 years ago

So I deleted the worker service from both the stg and sandbox env and ran this command:

AWS_PROFILE=stg AWS_REGION=us-east-1 copilot init --app connector        \                                                                                                                  
      --name worker                       \
      --type 'Backend Service'            \
      --dockerfile './Dockerfile.worker'

I wanted to see if it would work to create the service for both the env's, but I got these errors:

Failed to create ECR repositories for service worker.

✘ execute Backend Service init: add service worker to application connector: adding service worker resources to application connector: operation 11 for stack set connector-infrastructure failed

Is this expected?

Lou1415926 commented 2 years ago

Hello @johnAirRobe ! Thank you very much for the update, and apologies for the belated response 🙇🏼‍♀️ ! To answer your latest questions:

As of today, copilot svc init will do the following:

  1. Create a manifest
  2. Create an ECR repository in your app account
  3. Create SSM parameters for book-keeping purpose

It does not create other AWS resources, such as the ECS service, until you run for example copilot svc deploy --env staging, which will deploy the service to the environment staging only.

Will this create a service for both the prod, stg, sandbox env?

Per my description above, It doesn’t create the actual resources for any of these environment until you deploy the service to any of the envs.

Is there any way to only create a service for sandbox?

Yes! By running copilot svc deploy --env sandbox, and not running copilot svc deploy --env stg or copilot svc deploy --env prod.

It's not clear to me whether I need to create a new service for each new environment or if I just need to deploy the already created service to that new environment.

You’ve probably known, but just for clarity- it’s the latter!

I will follow up on this thread later today to discuss your previous questions, and steps for us to troubleshoot.

Lou1415926 commented 2 years ago

To facilitate tackling the larger questions, please allow me to discuss in general the Copilot credential (https://aws.github.io/copilot-cli/docs/credentials/) model first - feel free to skip this part though!

Apologies in advance for the length of the response!


In Copilot, you can use different profiles for:

  1. Creation of the application. Copilot will use the “default” profile to create the application, i.e. the profile named [default] in your ~/.aws/config file, or the profile that you've set via AWS_PROFILE=some-profile-name.
  2. Creation of the environments. When running copilot env init, Copilot will prompt you for the profile you want to use.
    
    $ copilot env init 

Name: prod-iad

Which credentials would you like to use to create prod-iad?

Enter temporary credentials [profile default] [profile test] [profile prod-iad] [profile prod-pdx]


`copilot env init` will create an `envManagerRole` that has some delegated access to the application resource as well as permissions required to manage the environment. Going forward, environment-specific operations (for example, deploying a service to the environment) will be done via. `envManagerRole`. Therefore, the profile used for the environment is no longer needed after this point. In another word, the profile used to create the environment is only needed during environment creation.

When executing any commands, Copilot expects that the default profile has the necessary permissions to access and modify resources in your application. It doesn't have to be the exact profile that you used when creating your application, but it needs necessary permissions.


In your case, since commands have been run with AWS_PROFILE=some_profile AWS_REGION=some_region copilot <command>, Copilot would expect that the “some_profile” has those necessary permissions. This typically means that it expect “some_profile” to refer to the account that you used to create your application.


Therefore, it seems to me that the issues you are seeing is the result of running commands using different profiles, which introduces some unexpected complications, for example:

  1. Perhaps copilot svc init or copilot svc deploy is run from a profile that doesn’t have necessary permissions to create or access the ECR repository in your app account.
  2. If you’ve run copilot app init with different profiles that refer to different accounts, perhaps you’ve unintentionally created an application in your account A, and an application in your account B.
  3. Other scenario that I haven’t thought of... 🤔

I think at this point, for us to troubleshoot your set up, it’d be beneficial for us to get to know your infrastructure status first.

Could you run

$ AWS_PROFILE=stg AWS_REGION=us-east-1 copilot app ls
$ AWS_PROFILE=stg AWS_REGION=us-east-1 copilot app show
$ AWS_PROFILE=prod AWS_REGION=<the region> copilot app ls
$ AWS_PROFILE=prod AWS_REGION=<the region> copilot app show

and let us know the results of these commands? Please remember to hide information that may be sensitive!

Thank you very much for the information you've provided so far, and I am sorry for the churn :(

johnAirRobe commented 2 years ago

Thanks for the follow-up. No worries about the churn! :)

Here are the outputs for those commands:

$ AWS_PROFILE=stg AWS_REGION=us-east-1 copilot app ls
connector
marketplace

$ AWS_PROFILE=stg AWS_REGION=us-east-1 copilot app show
About

  Name     connector
  Version  v1.0.0 (latest available: v1.0.2)
  URI      airdemo.link

Environments

  Name     AccountID     Region
  ----     ---------     ------
  stg      601068425913  us-east-1
  sandbox  601068425913  us-east-1

Workloads

  Name    Type                       Environments
  ----    ----                       ------------
  web     Load Balanced Web Service  stg

Pipelines

  Name
  ----
  pipeline-connector-connector

$ AWS_PROFILE=prod AWS_REGION=us-east-1 copilot app ls
connector
marketplace

$ AWS_PROFILE=prod AWS_REGION=us-east-1 copilot app show
About

  Name     connector
  Version  v1.0.1 (latest available: v1.0.2)
  URI      airrobe.link

Environments

  Name    AccountID     Region
  ----    ---------     ------
  prod    613445503623  us-east-1

Workloads

  Name    Type                       Environments
  ----    ----                       ------------
  web     Load Balanced Web Service  prod
  worker  Backend Service            prod

Pipelines

  Name
  ----
Lou1415926 commented 2 years ago

Thank you for providing these info - super helpful!

Just to make sure, the prod account and stg account are two different AWS accounts right? This is what I understood from reading the conversation above, please correct me if I'm wrong!

Giving that my understanding above ⬆️ is correct -


The expected set up would be that only one of the accounts has connector app. For example, suppose you want your application to be set up in the stg AWS account, then the expected output would be:

$ AWS_PROFILE=stg AWS_REGION=us-east-1 copilot app ls
connector

$ AWS_PROFILE=stg AWS_REGION=us-east-1 copilot app show
About

  Name     connector
  Version  v1.0.0 (latest available: v1.0.2)
  URI      airdemo.link

Environments

  Name     AccountID     Region
  ----     ---------     ------
  stg      601068425913  us-east-1
  sandbox  601068425913  us-east-1
  prod     613445503623  us-east-1    # <--- This environment is created in the prod account.

Workloads

  Name    Type                       Environments
  ----    ----                       ------------
  web     Load Balanced Web Service  stg, prod # <--- In addition to `stg`, `web` is also deployed in `prod` environment in the other account.
  worker  Backend Service            prod      # <--- `worker` is also deployed in `prod` environment in the other account.

Pipelines

  Name
  ----
  pipeline-connector-connector

$ AWS_PROFILE=prod AWS_REGION=us-east-1 copilot app ls
// Nothing.

$ AWS_PROFILE=prod AWS_REGION=us-east-1 copilot app show
// No app.

It seems to me that you have unintentionally set up two separate applications - one connector in stg account, and another connector in prod account. They may be named the same, but they are in fact two separate set of infrastructure :( I.E.

  1. If you’ve run copilot app init with different profiles that refer to different accounts, perhaps you’ve unintentionally created an application in your account A, and an application in your account B.

This is likely the root cause of some of the issues you've been seeing so far.


The expected set up that I've described above would have been achieved by:

$ export AWS_PROFILE=stg # `stg` is the default profile in this session.
$ export AWS_REGION=us-east-1

$ copilot app init # Create the app in `stg` account.

$ copilot env init
Name: prod

Which credentials would you like to use to create prod? 
> Enter temporary credentials 
> [profile default] 
> [profile stg] 
> [profile prod] # Selected this; create the env `prod` in `prod` account.
> [profile sandbox]

$ copilot svc init --name web

$ copilot svc deploy --name web --env stg

$ copilot svc deploy --name web --env prod # `web` will be deployed in the `prod` account. 

$ copilot svc init --name worker

$ copilot svc deploy --name worker --env prod # `worker` will be deployed in the `prod` account. 

We can figure out a path forward together - but before that, I am happy to explain more on the credential model or the analysis above if you have questions!

johnAirRobe commented 2 years ago

Okay interesting. This is a preexisting setup so, unfortunately, I'm not aware of how it was setup.

In terms of having Connector setup in one of the accounts, is it not possible to have both Marketplace and Connector setup in the same account? We use these two accounts to differentiate between our stg and prod infrastructure.

And if we cannot do that, we'd basically have to differentiate the accounts to be our Connector account and our Marketplace arround within which there will be a differentiate between the infrastructure environments stg, sandbox, and prod.

Lou1415926 commented 2 years ago

You can have two applications in the same account, and differentiate stg and prod infrastructure via environments.

For example, iyou can set up both Marketplace and Connector in, say, the stg account, and create prod environments for both application in the prod account. The commands would look like

$ export AWS_PROFILE=stg
$ export AWS_REGION=us-east-1

$ copilot app init --name connector
$ copilot env init --app connector --name prod --profile prod

$ copilot app init --name marketplace
$ copilot env init --app marketplace --name prod --profile prod
johnAirRobe commented 2 years ago

Okay, cool. Sooo thanks for the info been helpful, but I'm still having this issue:

"Invalid request provided: Create T
    askDefinition: Container.image repository should not be null or empty.
     (Service: AmazonECS; Status Code: 400; Error Code: ClientException; R
    equest ID: 7c32cfe2-f9af-413e-bcd2-10d5dedc9e81; Proxy: null)" (Reques
    tToken: e094e01f-c955-f100-4a41-8e5346322c64, HandlerErrorCode: Invali
    dRequest)

So I created a new worker service b/c I deleted the old one thinking it would help solve this issue:

AWS_PROFILE=stg AWS_REGION=us-east-1 copilot svc init -n worker

Then I tried to deploy it to the staging env:

AWS_PROFILE=stg AWS_REGION=us-east-1 copilot svc deploy -e stg -n worker

Lou1415926 commented 2 years ago

@johnAirRobe Seems most likely that the ECR repository used to store worker images wasn't properly created when svc init -n worker is executed 🤔 .

Would you please help us confirm by checking the CloudFormation template by going to CloudFormation console, and going to "Template" tab, do you see the following:

Metadata:
  TemplateVersion: 'v1.0.2'
  Version: 4
  Services:
  - worker # worker should be one of the services here
  - web 
  Accounts:
  - stg account number, feel free to leave this information out but please confirm what accounts are here
johnAirRobe commented 2 years ago

@Lou1415926 so I checked the following stacks:

And I couldn't find any of that metadata in those templates.

Could it be an issue with the difference in copilot versions when these applications, environments, and services were created?

Lou1415926 commented 2 years ago

Ohh sorry I forgot to mention the specific CFN stack you want to check! Apologies! It should be a stack in your stg account's us-east-1, named "StackSet-connector-infrastructure-<random string>". This stack should be created when you ran copilot app init --name connector using stg profile back in the day. The Metadata should be there even for old template versions such as v.0.0!

johnAirRobe commented 2 years ago

Yup, this is what it contains:

Metadata:
  TemplateVersion: 'v1.0.2'
  Version: 8
  Services:
  - web
  Accounts:
  - this contains only the stg account
Lou1415926 commented 2 years ago

Thank you! I see that Metadata.Services doesn't have - worker. This confirms that:

Seems most likely that the ECR repository used to store worker images wasn't properly created when svc init -n worker is executed 🤔 .

One possibility is that your stg profile doesn't have enough permissions to describe or modify the stack "StackSet-connector-infrastructure-<random string>". Would you please check for me that it has the following permissions:

"cloudformation:DescribeStackSet",
"cloudformation:DescribeStackSetOperation",
"cloudformation:UpdateStackSet"

Meanwhile, just to make sure that we are on the right track, would you mind running the following command:

aws ssm get-parameter --name /copilot/applications/connector/components/worker --region us-east-2

and see if you get a successful response (in which case you don't have to post it here) or an error that looks like "An error occurred (ParameterNotFound) when calling the GetParameter operation:"

johnAirRobe commented 2 years ago

@Lou1415926 my stg profile has AdministratorAccess which is why this is issue (and the one about not being able to access the secrets) so baffling to me!

I ran that command and changed the region to us-east-1, where the infra is located, and it returned a successful response.

Lou1415926 commented 2 years ago

That's so weird 🤔 If the ssm parameter is there, seems like worker is properly added to the connector application in stg account when you ran copilot svc init --name worker. What is the state of your "StackSet-connector-infrastructure-<random string>"? Is it in a green state such as UPDATE_COMPLETE, or others such as UPDATE_FAILED_ROLLBACK_COMPLETE?

johnAirRobe commented 2 years ago

Yes, it's very weird! 😅

It's in UPDATE_FAILED_ROLLBACK_COMPLETE.

Would the Status reasons in the CloudFormation console be helpful to show you?

Lou1415926 commented 2 years ago

Yes that'd be helpful to know!

johnAirRobe commented 2 years ago

image

image

Lou1415926 commented 2 years ago

Ohh Can you manually delete the image in the connector/worker repository, and then run:

$ AWS_PROFILE=stg AWS_REGION=us-east-1 copilot svc delete --name worker 
# Wait until the command succeeds without an error.

$ AWS_PROFILE=stg AWS_REGION=us-east-1 copilot svc init --name worker 
# Wait until the command succeeds without an error.

and then try

$ AWS_PROFILE=stg AWS_REGION=us-east-1 copilot svc deploy --name worker 

Finger crossed this solves it!

johnAirRobe commented 2 years ago

Do you mean delete the entire repository? Or delete a particular image in the repository? If the latter which image?

Lou1415926 commented 2 years ago

The screenshot you showed said that the repository wasn't successfully deleted "because it still contains images". I bet this happened when you ran copilot svc delete --name worker: the operation wasn't completed successfully because of the said reason. The following failure to create a repository seems to be cascaded from this deletion failure.

Therefore, to fix it, let's empty the repository - that is deleting all images - so that the repository is empty. This way copilot svc delete --name worker should execute correctly.

I asked you to run both svc delete and svc init yesterday, but now since you haven't run it, I think just to be safe, let's run only svc delete, and see if we are in a good state to proceed.

Let me know if the copilot svc delete --name worker succeeds! It'd be also helpful to also check the CloudFormation template, and see if:

  1. The stack "connector-stg-worker" is deleted;
  2. The stack "StackSet-connector-infrastructure-<random string>" is in UPDATE_COMPLETE, and that ECRRepoworker resource is DELETE_COMPLETE
  3. Run aws ssm get-parameter --name /copilot/applications/connector/components/worker --region us-east-1 and see if it's returns ParameterNotFound error.

If all of these check out, we should be good to continue.

johnAirRobe commented 2 years ago

I deleted the remaining images in that repository and then ran the svc delete command and it looks like it worked.

  1. "connector-stg-worker" was deleted.
  2. The state for "StackSet-connector-infrastructure-<random string>" hasn't changed, it's still in the UPDATE_ROLLBACK_COMPLETE state.
  3. I got the ParameterNotFound error.

I'm guessing point 2 is unexpected. Is that stack created when the application is initialised? Would we be able to deploy the service if that stack isn't changing the state as expected?

Lou1415926 commented 2 years ago

UPDATE_ROLLBACK_COMPLETE would mean that CFN tried to update that stack, but something went wrong, which triggered a rollback.

I think in this case it's important to know what went wrong 🤔 Would you mind sharing a screenshot like before?

Edit: since the stack set "StackSet-connector-infrastructure-<random string>" was in a rollback state before, it's possible that UPDATE_ROLLBACK_COMPLETE here doesn't indicate any unexpected issue. But just to be safe!

johnAirRobe commented 2 years ago

So I actually don't think this stack was touched when I deleted the "connector-stg-worker" as the latest's logs don't have any timestamp for when I did the delete 3 hours ago. If it was touched there should be a log for ~2130.

But here are the rest of the logs after the 20th: image

Lou1415926 commented 2 years ago

Yup my best guess is that the stack instance didn't get touched because there was no actual update to the resource. Normally copilot svc delete --name worker would update this stack instance, removing - worker from Metadata.Services and ECRRepoworker from Resources, but since they are not there, the stack instance virtually has no changes, hence no update.

I think you can try running this now:

$ AWS_PROFILE=stg AWS_REGION=us-east-1 copilot svc init --name worker 
# Wait until the command succeeds without an error.

After the execution completes, we want to check that:

  1. The stack "connector-stg-worker" is created;
  2. The stack "StackSet-connector-infrastructure-<random string>" is in UPDATE_COMPLETE, and that ECRRepoworker resource is CREATE_COMPLETE
  3. Run aws ssm get-parameter --name /copilot/applications/connector/components/worker --region us-east-1. The response should be a successful one.

Fingers crossed!!!

johnAirRobe commented 2 years ago

I can the svc init command but got an error stating the repository already existed. But the state on StackSet... did change when I ran this command.

I deleted the repo and ran that command again and it completed successfully but there was repo created nor was the connector-stg-worker stack created, or the StackSet... state changed, but the ssm command did return a success.

I think I'll try to create a new application with these environments to see if that works.

Lou1415926 commented 2 years ago

Sorry the stack connector-stg-worker shouldn't be created until you run copilot svc deploy. That was a mistaken statement that I made! Apologies 🙇🏼‍♀️ !

I deleted the repo and ran that command again and it completed successfully but there was repo created...

Did you mean that the repo wasn't created after the second execution of copilot svc init? I think this is probably because CFN didn't detect change between the last template and the new template, and hence not triggering an UPDATE.

I think you should be a good shape after you deleted the repo. After the repo is deleted, you should be able to run copilot svc delete --name worker, and then copilot svc init --name worker.

Running svc delete --name worker removes worker-related resources from the template. Running svc init. on the other hand, add them back. This should properly triggers CFN updates.

Before trying to create a new application, you can give it a try.

johnAirRobe commented 2 years ago

I'm getting that permissions issue when I run svc delete:

image

Lou1415926 commented 2 years ago

Was svc delete executed with the flag --name worker?

johnAirRobe commented 2 years ago

Yup:

AWS_PROFILE=stg AWS_REGION=us-east-1 copilot svc delete --name worker
Sure? Yes
✔ Deleted service worker from environment stg.
✔ Deleted service worker from environment sandbox.
✘ Failed to delete resources of service worker from application connector.
✘ removing worker service resources from application: operation 14 for stack set connector-infrastructure failed
Lou1415926 commented 2 years ago

Hmm not sure why CFN thinks it needs to untag ECRRepoweb 🤔 I have a gut feeling that this might be related to some old state of the application back when it was set up, that the prod account was somehow associated with this app in stg account. Detaching the other account from this app in stg triggered an UPDATE to ECRRepoweb; upon UPDATE, CFN tries to remove ECR repo tags that are not passed down by CFN, which triggers the ecr:UntagResources permission error that you saw.

This of course is just a guess, and would need to be confirmed. However, I'm aware that this has been a lot of troubles for you :(. So if you feel like to, please go ahead and recreate the application, so that it starts from a clean state. However, if you feel like proceeding with the current application, I am happy to continue this and would need to confirm something with you.

I'm so sorry for the back and forth and all the churning 🙇🏼‍♀️ !

johnAirRobe commented 2 years ago

@Lou1415926 so I deleted the old application, envs, and services then created a new one. It all works now. But I'm having an issue with running a task.

I ran this command to create a new command copilot task run --generate-cmd connector/sandbox/worker and replaced the --image flag with --dockerfile and when I run the command it seems to be running the image however it stops and I get a An error occurred (InvalidParameterException) when calling the DescribeTasks operation: Tasks cannot be empty. error. I get the impress this is a generic error as I've gotten it with other errors that have happened inside the container.

When I say it's running the image it's because I get this in the logs:

copilot-task/connector-sa D, [2022-05-08T21:39:10.253638 #1] DEBUG -- :    (0.3ms)  SELECT pg_try_advisory_lock(6541742368595981820)
copilot-task/connector-sa D, [2022-05-08T21:39:10.263831 #1] DEBUG -- :    (0.6ms)  SELECT "schema_migrations"."version" FROM "schema_migrations" ORDER BY "schema_migrations"."version" ASC
copilot-task/connector-sa D, [2022-05-08T21:39:10.269450 #1] DEBUG -- :   ActiveRecord::InternalMetadata Load (0.4ms)  SELECT "ar_internal_metadata".* FROM "ar_internal_metadata" WHERE "ar_internal_metadata"."key" = $1 LIMIT $2  [["key", "environment"], ["LIMIT", 1]]
copilot-task/connector-sa D, [2022-05-08T21:39:10.274099 #1] DEBUG -- :    (0.3ms)  SELECT pg_advisory_unlock(6541742368595981820)
Task has stopped.

This is the Dockerfile that I'm using:

FROM --platform=linux/amd64 ruby:3.0.3-slim-buster
LABEL AirRobe Connector <developers@airrobe.com>

# These Environment variable ('ENV') that would be exported to the container when it runs
ENV RAILS_ENV=production
#same version as in Gemfile.lock to avoid conflicts
ENV BUNDLER_VERSION=2.2.32
ENV BUNDLE_BIN=
ENV NPM_CONFIG_LOGLEVEL info
ENV NODE_VERSION 16.14.0

# These ('ARG') are environment variables that are only available when building of this image
ARG DEBIAN_FRONTEND="noninteractive"
ARG TZ="Australia/Melbourne"

# essential build tools
ARG BUILD_PACKAGES="build-essential git dirmngr gpg-agent gpg"
#libraries required by bundle to build native extensions
ARG DEV_PACKAGES="libpq-dev postgresql-client openssl shared-mime-info libyaml-dev zlib1g zlib1g-dev file"
ARG RUBY_PACKAGES="tzdata"
#tools the container wants to run
ARG TOOLS=""
#nice to have
ARG EXTRA_TOOLS="bash curl vim net-tools"

# Install base package dependencies
RUN apt-get update \
  && apt-get -y dist-upgrade \
  && apt-get --no-install-recommends -y install \
  $BUILD_PACKAGES $DEV_PACKAGES $RUBY_PACKAGES $TOOLS $EXTRA_TOOLS \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# Prevent GPG from trying to bind on IPv6 address even if there are none
RUN mkdir ~/.gnupg \
  && chmod 600 ~/.gnupg \
  && echo "disable-ipv6" >> ~/.gnupg/dirmngr.conf

# Install bundler
RUN gem install bundler -v $BUNDLER_VERSION

WORKDIR /app

###############################################################
# Install gems using bundler
#  - Copy in just the Gemfile to install gems.
#  - This ensures that we dont invalidate this expensive docker layers
#    with other file changes coming in later copy cmd.
################################################################
COPY Gemfile Gemfile.lock ./
RUN bundle config set --local without 'test,development' \
  && bundle config set --local deployment 'true' \
  && bundle check || bundle install

# Now copy the whole app into app
COPY . .

EXPOSE 3001

# use bash as default entry point
# use CMD in the manifest to run specific task. eg: bin/web or bin/worker
CMD ["bin/worker"]

Do you want to see the task command that I'm trying to run?

johnAirRobe commented 2 years ago

I guess I'll close this issue now as the current problem I'm facing isn't related to the original post.

Lou1415926 commented 2 years ago

Hello @johnAirRobe !

Feel free to open a new issue for the the problem you are currently facing! We can move the discussion there.

I ran this command to create a new command copilot task run --generate-cmd connector/sandbox/worker and replaced the --image flag with --dockerfile and when I run the command it seems to be running the image however it stops and I get a An error occurred (InvalidParameterException) when calling the DescribeTasks operation: Tasks cannot be empty. error. I get the impress this is a generic error as I've gotten it with other errors that have happened inside the container.

I have a few questions to help me locate where the problem may be:

  1. Was the error "An error occurred (InvalidParameterException) when calling the DescribeTasks operation: Tasks cannot be empty." prepended with anything? For example "describe tasks: An error occurred...".
  2. Was the error spitted out after the line ...Task has stopped. in the log output you showed?
  3. Did the log look good to you? That is, is it expected for the task to have been stopped after " SELECT pg_advisory_unlock(6541742368595981820)"?

Again, feel free to move over to a new issue!

By the way, was the original problem resolved by re-creating the application?

johnAirRobe commented 2 years ago

@Lou1415926 Okay, I'll move over to a new issue.

BTW, thanks for exploring this issue with me! Even though it didn't ultimately help to resolve it I did get a better understanding of Copilot and other AWS stuff through it.