aws-amplify / amplify-backend

Home to all tools related to Amplify's code-first DX (Gen 2) for building fullstack apps on AWS
Apache License 2.0
183 stars 62 forks source link

Error loop with custom domain validation using external DNS #1833

Open RobertBouillon opened 3 months ago

RobertBouillon commented 3 months ago

Environment information

Not Applicable / Using AMP1

Description

Steps to reproduce

  1. Set up custom domain
  2. Start the process / generate the SSL cert
  3. Enter the CNAME into the DNS server after 1 day (simulate propagation time)
  4. Go back Amplify control panel. DNS authorization failed. Retry button is the only option,
  5. Click the retry button
  6. A new SSL certificate is generated
  7. A new CNAME key is generated
  8. The validation fails because Amplify sees the old CNAME in the DNS
  9. Go back to step 4.
Jay2113 commented 3 months ago

Hi @RobertBouillon 👋, thanks for reaching out. Can you confirm if you are utilizing the Amplify managed SSL/TLS certificate or a custom SSL/TLS certificate during the domain activation process? Also, can you share your Amplify app id?

RobertBouillon commented 3 months ago
  1. We're using the Amplify-managed SSL/TLS
  2. App ID: db543lofnne0x

We were able to use the following workaround:

I don't know if it was because the CNAME was incorrect, or if it just took too long for the correct CNAME to appear in the DNS.

Note we're using Route53 as our DNS, however it's managed under a different account, so it's "external" for all intents and purposes.

Jay2113 commented 3 months ago

Thanks for sharing that information @RobertBouillon. We highly recommend that you update your CNAME records in the DNS provider settings as soon as you create your custom domain. After your app is created in the Amplify console, your CNAME record is checked every few minutes to determine if it resolves. If it doesn’t resolve after an hour, the check is made every few hours, which can lead to a delay in your domain being ready to use. If you added or updated your CNAME records a few hours after you created your app, its likely that your app could get stuck in the Pending Verification state.

RobertBouillon commented 3 months ago

Thanks - we did manage to get it up and running with the workaround described previously.

The behavior I observed deviates a bit from what you described. In less than one hour after I clicked "retry", the process failed with a red X on the first step, and the "retry" button was the only way to restart the process. I did wait more than a day after making a change and there were no automatic retries; it seems to have given up after the first failure.

There seems to be something like a 30-minute window to get the DNS entry in or the process fails. Note that the initial failures were not because the CNAME record was missing, but because it was incorrect (it reflected the CNAME generated from the previous attempt, because DNS cache latency). The only way around this was to lower the TTL, delete the entry, restart the process, and add the CNAME in real-time with Amplify.

Jay2113 commented 3 months ago

Thanks for sharing the additional context. When you click the retry button, it will generate new DNS records (the ACM record might remain the same, but the CloudFront distribution record will change). Therefore, if the domain activation process fails, we recommend deleting the domain and then restarting the process. It is crucial that you add the records to your DNS settings promptly after adding your custom domain in the Amplify console. The AWS Certificate Manager (ACM) immediately starts attempting to verify ownership, and the verification checks become less frequent over time.

RobertBouillon commented 3 months ago

If the recommendation is to delete the record, wait for the TTL to expire, and then try again (presumably up to 3 days), that's not clear during the process, and seems less than ideal.

If the argument is that this is not a defect and is "working as intended," there are a couple problems with that: 1, There is no indication that the DNS record changed when you hit "retry." You have to find out when it fails again.

  1. There's no instruction to delete the previous DNS record before retrying

Based on the information I have, I would recommend two possible remedies:

  1. Ideally, hitting "retry" will just retry the ACM verification process and not reissue an SSL certificate with new DNS records
  2. If "retry" is clicked and the SSL process restarts, it should be clear to the user that the DNS entries previously entered are invalid, and Amplify should provide up to 3 days for the changes to take affect because DNS can take up to 3 days to propagate. The ACM verification process should not fail and stop retrying if the CNAME record exists and is incorrect.
frankadrian commented 1 month ago

Im experiencing this exact problem as @RobertBouillon has described it. It's really annoying that you can't just "retry" with the same cloud front domain being used.

Jay2113 commented 1 month ago

@RobertBouillon @frankadrian Thanks for sharing that feedback. We appreciate you highlighting the confusion around the current retry process. We are working on improving that workflow, and in the meantime, we will update our documentation to clarify how it works. Thank you.