Unable to access my Copilot instance

aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.

https://aws.github.io/copilot-cli/

Apache License 2.0

3.51k stars 408 forks source link

Unable to access my Copilot instance #4377

Closed johanleroux closed 1 year ago

johanleroux commented 1 year ago

copilot version: v1.24.0

We are in the process of setting up Copilot to hopefully replace our traditional EC2 instance with Docker containers on it, unfortunately, we are running into a lot of issues, one raised here.

We set up our dev/staging environments, aptly called Alpha, Beta, and Delta, everything was running well, except for the minor issue listed above. We were ready to start the setup for our Production environment when we ran into an issue after issue.

NOTE: All of our staging environments are located in the eu-west-2 region with the production environment being alone in us-east-1

After running copilot env init and setting production up as per the staging environments, with the only exception being the region difference, we tried to deploy the env being greeted with a 403 error for our token.
After checking permissions and the token, everything was configured to work, we then refreshed the session token through the AWS CLI.
Running copilot env deploy on the production environment now gave a 400 error with an InvalidToken
We then generated a new token in the AWS console and configured it through the AWS CLI
Now copilot is unable to retrieve the application or make any changes
- ✘ get application example: couldn't find an application named example in account xxxxxxxxx and region eu-west-2

Changing the region to us-east-1 makes no difference.

Any ideas as to what is causing these issues and a possible solution/workaround?

At this point in time we don't feel comfortable moving our infrastructure and deployment workflow onto Copilot.

iamhopaul123 commented 1 year ago

Hello @johanleroux. I am so sorry for the inconvenience you had when using Copilot. I feel like you were having some trouble with aws profile.

From the issue description above, it seems like your staging envs are located in eu-west-2 region and the prod env is in us-east-1. Ideally, you are expected to keep the default profile (configured with account and region) when running copilot commands to remain the same. For example, when running Copilot commands, always keep the default profile to be account 1234567890 and region to be eu-west-2, while keeping multiple profiles for your staging envs and prod env, so that when running env init you'll be able to select which profile you'll be using to create the env.

copilot env init
Environment name: prod

  Which credentials would you like to use to create prod?  [Use arrows to move, type to filter, ? for more help]
  > Enter temporary credentials
    [profile default]
    [profile west1]
    [profile open-pulse]
    [profile test]
    [profile rdws-vpc]

we tried to deploy the env being greeted with a 403 error for our token.

Would you mind to put more details on the error message you got for us to debug? Or it's resolved by "After checking permissions and the token, everything was configured to work"?

we then refreshed the session token through the AWS CLI.

This will most likely cause the profile Copilot was using to expire and not able to make any API calls anymore.

✘ get application example: couldn't find an application named example in account xxxxxxxxx and region eu-west-2

It's because you refreshed the session token and the default profile Copilot uses was not a valid one.

To sum it up, it seems like you had some permission issues before and after configuring with proper permission, the token got refreshed, which caused the profile used by Copilot not valid anymore. I would recommend configuring your default profile with an IAM user or update promptly whenever the token gets revoked or expired. Thank you for your interest in Copilot and please let us know if there's anything we can help with the migration.

johanleroux commented 1 year ago

@iamhopaul123

I haven't been able to reproduce the token errors I got earlier but will reply with a full stack trace if it happens again.

I did notice my AWS-CLI was still on v1, updated that to v2, and reconfigured my default profile with a new access key, as per your suggestion.

Running copilot app ls returned no values. copilot app show however did return the same error ✘ get application example: couldn't find an application named example in account xxxxxxxxx and region eu-west-2

For sanity's sake, I changed the default region to us-east-1, and it gave me the same error ✘ get application example: couldn't find an application named example in account xxxxxxxxx and region us-east-1

I tried running copilot app init (thinking maybe it was deleted by accident), but it returned ✘ application named "example" already exists in another region

So the application exists, but not in eu-west-2 or us-east-1. Using the Resource Management Tag Editor, I found that there was a example-infrastructure-roles Cloudformation stack in af-south-1. A region that is not and never was configured in my AWS CLI config.

Changing the region to af-south-1 in my default config, and running copilot app ls shows the app is listed, but running copilot app show gives me the following error: services/jobs deployed to live: get resources by Copilot tags: get resource: UnrecognizedClientException: The security token included in the request is invalid status code: 400, request id:

Is there any way we can move the application to the correct region? If not we might need to manually go and delete all references to the Copilot resources and recreate it from scratch.

iamhopaul123 commented 1 year ago

Gotcha. Thank you for providing these details. It seems like you've unintentionally done copilot init or copilot app init in af-south-1 region before and there's no way to "move" an application to a region. Could you switch to af-south-1 and try to run copilot app delete in your example app folder to delete the app? The last resort will be using the tag editor to locate resources tagged with copilot-application: YOUR-APP-NAME and remove them.

Changing the region to af-south-1 in my default config, and running copilot app ls shows the app is listed, but running copilot app show gives me the following error: services/jobs deployed to live: get resources by Copilot tags: get resource: UnrecognizedClientException: The security token included in the request is invalid status code: 400, request id:

This is surprising to me. I'm wondering if you're using the CLI with MFA, and you'll have to set the session token in addition to setting the access and secret keys. If you are not using MFA, removing aws_session_token might do the trick.

iamhopaul123 commented 1 year ago

Hello @johanleroux. Hopefully you've resolved the issue. If you have additional concerns and would prefer setting up an online meeting to debug the issue. Please feel free to reach out to the team at aws-copilot-feedback@amazon.com.

johanleroux commented 1 year ago

@iamhopaul123 apologies for the slow reply. Got caught up with other tasks, that required my attention.

I spent a bit of time checking if there was a session token set, etc but didn't get anything to work with the live environment.

So we went through the Tag Editor and manually deleted all copilot-related resources, actually realized we had a couple of other test-related resources still running, so cleaned that up.

I am in the process of redoing the entire copilot integration, and actually using the newly launched Environment Resources, which is amazing. A question I had with it, is it possible to share an environment resource across multiple environments. As a cost-saving measure, we would like our staging environments to share certain resources, like ElastiCache and RDS.

I was under the assumption that setting the same ClusterName for the EC instance would work, but failing on the second env deployment with error example-staging-elasticache already exists in stack

Example structure

  ElastiCacheRedisCacheCluster:
    Type: 'AWS::ElastiCache::CacheCluster'
    Properties: 
      AutoMinorVersionUpgrade: True
      AZMode: 'single-az'
      CacheNodeType: cache.t3.micro
      ClusterName: !If [IsLiveEnv, !Sub '${App}-live-ElastiCache', !Sub '${App}-staging-ElastiCache']

efekarakus commented 1 year ago

A question I had with it, is it possible to share an environment resource across multiple environments. As a cost-saving measure, we would like our staging environments to share certain resources, like ElastiCache and RDS.

Hi @johanleroux ! By default, environments are designed to be completely independent network boundaries such that the blast radius when a change goes poorly is contained within a single environment.

That said, optimizing on cost for "test" environments totally make sense. Today, you can share the same VPC across environments by using the "imported VPC settings" in the environment manifest:

name: imported
type: Environment
network:
  vpc:
    id: 'vpc-12345'
    subnets:
      public:
        - id: 'subnet-11111'
        - id: 'subnet-22222'
      private:
        - id: 'subnet-33333'
        - id: 'subnet-44444'

Afterwards, while defining environment addons for the ElastiCacheRedisCacheCluster you can allow ingress from multiple environment security groups. For example, say you have two environments test and qa that want to share the same redis cluster. Then you can define the following security group ingress resources:

TestEnvIngressToRedis:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      Description: Ingress from Fargate containers
      GroupId: !Ref 'RedisSecurityGroup'
      IpProtocol: tcp
      FromPort: 6379
      ToPort: 6379
      SourceSecurityGroupId: { 'Fn::ImportValue': !Sub '${App}-test-EnvironmentSecurityGroup' }

QaEnvIngressToRedis:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      Description: Ingress from Fargate containers
      GroupId: !Ref 'RedisSecurityGroup'
      IpProtocol: tcp
      FromPort: 6379
      ToPort: 6379
      SourceSecurityGroupId: { 'Fn::ImportValue': !Sub '${App}-qa-EnvironmentSecurityGroup' }

To limit the redis cluster creation to only a single environment you can create a Condition in cloudformation:

Conditions:
  IsTestEnv: !Equals [!Ref Env, "test"]

Resources:
  ElastiCacheRedisCacheCluster:
    Type: 'AWS::ElastiCache::CacheCluster'
    Condition: IsTestEnv

Hope this helps!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no response activity, and is tagged with pending/question. Remove the stale label, add a comment, or this will be closed in 14 days.

github-actions[bot] commented 1 year ago

This issue is closed due to inactivity. Feel free to reopen the issue if you have any follow-ups!