aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
https://aws.github.io/copilot-cli/
Apache License 2.0
3.48k stars 400 forks source link

Certificate issuance will fail if restrictive CAA policy exists in lower domains. #3347

Open matthewhembree opened 2 years ago

matthewhembree commented 2 years ago

If restrictive DNS CAA resource records exist in lower domains, the environment certificate issuance will fail.

Summary error text:

The following resource(s) failed to create: [HTTPSCert]. Rollback requested by user.

Full output:

❯ dig example.com CAA +short
0 issuewild ";"
0 issue ";"

❯ dig demo.example.com CAA +short

❯ dig environment.demo.example.com CAA +short

❯ copilot env init
Environment name: certfail
Credential source: [profile default]
Default environment configuration? Yes, use default.
✔ Linked account 920557916861 and region us-west-2 to application demo..

✔ Proposing infrastructure changes for the demo-certfail environment.
- Creating the infrastructure for the demo-certfail environment.                     [rollback failed]  [636.5s]
  The following resource(s) failed to create: [HTTPSCert]. Rollback requ
  ested by user.
  The following resource(s) failed to delete: [EnvironmentHostedZone, HT
  TPSCert].
  - An IAM Role for AWS CloudFormation to manage resources                           [delete skipped]    [19.9s]
  - An ECS cluster to group your services                                            [delete complete]  [3.9s]
  - Delegate DNS for environment subdomain                                           [delete complete]  [40.1s]
  - An IAM Role to describe resources in your environment                            [delete skipped]    [19.4s]
  - A security group to allow your containers to talk to each other                  [delete complete]  [3.9s]
  - Request and validate an ACM certificate for your domain                          [delete failed]    [310.2s]
    Received response status [FAILED] from custom resource. Message return
    ed: Resource is not in the state certificateValidated (Log: /aws/lambd
    a/demo-certfail-CertificateValidationFunction-RsPimSoW88HO/2022/03/11/
    [$LATEST]1e460e9ac23549b0a17fd31b16d28319) (RequestId: 0a415a0e-e2ef-4
    152-917c-5bdc5e0c034f)
    Received response status [FAILED] from custom resource. Message return
    ed: Cannot read property 'Name' of undefined (Log: /aws/lambda/demo-ce
    rtfail-CertificateValidationFunction-RsPimSoW88HO/2022/03/11/[$LATEST]
    1e460e9ac23549b0a17fd31b16d28319) (RequestId: 6432c6dd-96bd-41b7-b04d-
    dc1a5402d3b4)
  - An Internet Gateway to connect to the public internet                            [delete complete]  [0.0s]
  - Private subnet 1 for resources with no internet access                           [delete complete]  [3.9s]
  - Private subnet 2 for resources with no internet access                           [delete complete]  [7.5s]
  - Public subnet 1 for resources that can access the internet                       [delete complete]  [0.0s]
  - Public subnet 2 for resources that can access the internet                       [delete complete]  [2.0s]
  - A Virtual Private Cloud to control networking of your AWS resources              [delete complete]  [17.5s]
✘ stack demo-certfail did not complete successfully and exited with status ROLLBACK_FAILED

I'm undecided on the solution for this. I have a PR staged to add the CAA resource record in the environment CloudFormation stack so that the certificate issuance will always succeed. Although, I wonder if the appropriate solution would be to check the CAA policy through the domain hierarchy and warn the operator if the policy will block ACM certificate issuance. A similar warning mechanism is present when there are delegation concerns with the hosted zone.

Example of warnings on hosted zone: ❯ copilot app init --domain amazon.com ✘ get hosted zone ID for domain amazon.com: domain does not exist ❯ ccopilot app init --domain example.com Note: The account does not seem to own the domain that you entered. Please make sure that example.com is registered with Route53 in your account, or that your hosted zone has the appropriate NS records. To transfer domain registration in Route53, see: https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/domain-transfer-to-route-53.html To update the NS records in your hosted zone, see: https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/SOA-NSrecords.html#NSrecords

Arguments for copilot env init --domain ecs.example.com to create CAA resource record at the environment sub domain level:

Another solution would be to warn the operator and implement a flag (e.g. --override-caa) that would create the environment level CAA record. That would definitely take more time to implement and the cost of extending the CLI (and adding tests) with a new flag would need to be taken into consideration.

I don't think the solution is just a documentation update. Operators should not be forced to understand the mechanics of the CAA specification.

References:

Thanks!

iamhopaul123 commented 2 years ago

Hello @matthewhembree. Do you think certificate import for environment level would help you mitigate this issue? So that the env level hosted zone could be managed by users?

Although, I wonder if the appropriate solution would be to check the CAA policy through the domain hierarchy and warn the operator if the policy will block ACM certificate issuance.

Good idea. I think when a domain is present for an app, when we create an environment, similar validation should be done so that users would know the env hosted zone won't work before we actually deploy the environment.

Operators should not be forced to understand the mechanics of the CAA specification.

100% agreed.

matthewhembree commented 2 years ago

Do you think certificate import for environment level would help you mitigate this issue? So that the env level hosted zone could be managed by users?

@iamhopaul123 I do think #2694 is a useful feature. I'll add my use cases to that that issue. I can see the possibility of an organization that has a team (SecOps) that is solely responsible for the certificate lifecycle management. That team might be different that the development team. And certificate import is a way to ensure compliance with that organization's policies.

A CAA policy would be a technical control of that team, but the CAA specification does not have a mechanism to block subdomain overrides. So to be a good citizen in that organization, the CAA override would need to be explicit. If override is not an option in such an organization, then they can provide an ACM ARN.

jasonmarlin commented 2 years ago

We are also having a problem validating the ACM request. We were getting the warning that the domain isn't owned despite all our domains having zones in Route 53. Finally I did a full transfer of a test domain from another registrar to AWS and that one was able to create records for my environment. However, watching the ACM console, the process was timing out on validating the created certificates for some reason with an error that was just:

Received response status [FAILED] from custom resource. Message return ed: Resource is not in the state certificateValidated

Anyway, all that to say that I think certificate import and/or some documentation on how to manually achieve the same result would be wonderful.

iamhopaul123 commented 2 years ago

Hello @jasonmarlin and @matthewhembree. We are currently actively working on this issue https://github.com/aws/copilot-cli/pull/3386! Please stay tuned.

However, watching the ACM console, the process was timing out on validating the created certificates for some reason with an error that was just:

As for this, did you use custom domain or just the default domain?

jasonmarlin commented 2 years ago

Hi @iamhopaul123 - I did use the custom domain, but I think my cert issuance could be unrelated as I was also hung up manually creating it. #3386 looks like exactly what we need.

Many thanks for copilot - we're loving it so far and everyone on your team is doing a fantastic job on issue mgt and incorporating community feedback. 🙏