Deleting the stack with a custom domain for a Cognito user pool fails on trying to delete CertificateRequestorResource.
Expected Behavior
Deletion succeeds.
Current Behavior
Deletion fails with message:
DELETE_FAILED | AWS::CloudFormation::CustomResource | xxxxxxxxxx/CertificateRequestorResource/Default (yyyyyyyyCertificateRequestorResourceF53AA380) Received response status [FAILED] from custom resource. Message returned: Response from describeCertificate did not contain an empty InUseBy list after 10 attempts.
Reproduction Steps
Deploy the stack which has a Cognito user pool with a custom domain.
Setting such a stack requires defining a certificate for the custom domain. I do it using DnsValidatedCertificate, my code (in Clojure with custom helper functions):
user-pool (-> (UserPool$Builder/create stack "user-pool")
...
(.userPoolName user-pool-name)
.build)
; Cognito requires the parent domain to have a valid DNS A record.
; The parent may be the root of the domain, or a child domain that is one step up in the domain hierarchy.
; For example, if your custom domain is auth.xyz.example.com,
; Cognito must be able to resolve xyz.example.com to an IP address.
;
; The record points "nowhere", https://stackoverflow.com/questions/51249583.
apex (dns/domain user-pool-name "foo.com")
_ (cdk.route53/add-a-record stack apex (RecordTarget/fromIpAddresses (into-array ["127.0.0.1"])))
user-pool-domain (dns/domain "auth" apex)
hosted-zone (cdk.route53/memoized-fetch-hosted-zone stack user-pool-domain)
cert (-> (DnsValidatedCertificate$Builder/create stack (str user-pool-domain "-cert"))
(.domainName user-pool-domain)
(.hostedZone hosted-zone)
(.region (str Region/US_EAST_1)) ; This region is required by Cognito
.build)
domain (.addDomain user-pool "domain" (-> (UserPoolDomainOptions/builder)
(.customDomain (-> (CustomDomainOptions/builder)
(.certificate cert)
(.domainName user-pool-domain)
.build))
.build))
_ (cdk.route53/add-a-record stack user-pool-domain (RecordTarget/fromAlias (UserPoolDomainTarget. domain)))
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.100.0 (build e1b5c77)
Framework Version
2.100.0
Node.js Version
18.17.1
OS
macOS
Language
Java
Language Version
Java (17)
Other information
Cause
The cause seems to be that the certificate is still used by the "phantom" CloudFront distribution which belongs to the unknown account 455458493081 and I can't find it anywhere in the GUI.
It can be seen in the ACM GUI or via aws acm describe-certificate --certificate-arn ... --region us-east-1 and then looking at InUseBy key.
After a few minutes this dependency is automatically cleaned and the repeated attempt to delete the stack will succeed after that.
I suspect this is the distribution containing the Cognito's hosted UI website.
The CloudFront distribution that AWS creates for the custom Cognito domain will be removed in a few hours after you delete the user pool (or delete the custom domain via the Cognito console / API). This seems to be completely hidden from the user (you).
But there are several reports of a similar issue with certificates for API Gateway, e.g.:
I tried to retain the certificate on deletion via (.applyRemovalPolicy cert RemovalPolicy/RETAIN_ON_UPDATE_OR_DELETE). This allows the stack deletion to succeed. But when I deployed the same stack again immediately it failed with:
user-pool/domain (userpooldomainB4026A3C) One or more of the CNAMEs you provided are already associated with a different resource. (Service: AmazonCloudFront; Status Code: 409; Error Code: CNAMEAlreadyExists; Request ID: 6e3993dc-5cb7-4d0d-a267-ed58ca49dee3; Proxy: null) (Service: AWSCognitoIdentityProviderService; Status Code: 400; Error Code: InvalidParameterException; Request ID: 562d0982-bc92-4923-93d1-55732114571f; Proxy: null)
Strangely, deploying one more time succeeded. But in any case, it doesn't seem to be a reliable workaround and with time will pollute ACM with unused certificates.
Describe the bug
Deleting the stack with a custom domain for a Cognito user pool fails on trying to delete
CertificateRequestorResource
.Expected Behavior
Deletion succeeds.
Current Behavior
Deletion fails with message:
Reproduction Steps
Deploy the stack which has a Cognito user pool with a custom domain.
Setting such a stack requires defining a certificate for the custom domain. I do it using
DnsValidatedCertificate
, my code (in Clojure with custom helper functions):Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.100.0 (build e1b5c77)
Framework Version
2.100.0
Node.js Version
18.17.1
OS
macOS
Language
Java
Language Version
Java (17)
Other information
Cause
The cause seems to be that the certificate is still used by the "phantom" CloudFront distribution which belongs to the unknown account 455458493081 and I can't find it anywhere in the GUI. It can be seen in the ACM GUI or via
aws acm describe-certificate --certificate-arn ... --region us-east-1
and then looking atInUseBy
key.After a few minutes this dependency is automatically cleaned and the repeated attempt to delete the stack will succeed after that.
I suspect this is the distribution containing the Cognito's hosted UI website.
I found a single mention of the similar Cognito problem in https://stackoverflow.com/questions/75134728/phantom-cloudfront-distribution-blocks-me-from-creating-cognito-custom-domain. And the answer there states:
But there are several reports of a similar issue with certificates for API Gateway, e.g.:
Workaround attempt
I tried to retain the certificate on deletion via
(.applyRemovalPolicy cert RemovalPolicy/RETAIN_ON_UPDATE_OR_DELETE)
. This allows the stack deletion to succeed. But when I deployed the same stack again immediately it failed with:Strangely, deploying one more time succeeded. But in any case, it doesn't seem to be a reliable workaround and with time will pollute ACM with unused certificates.
Solution ideas
1) The ideal solution is to fix it somewhere in CloudFront or Cognito. So that deletion of the pool immediately cleans the corresponding certificate
InUseBy
array. 1) The solution in CDK could be to increase the number of attempts inaws-certificatemanager/dns-validated-certificate-handler
deleteCertificate
function: https://github.com/aws/aws-cdk/blob/c66e197f6f8840da6475383dbf2421c3b06ea417/packages/%40aws-cdk/custom-resource-handlers/lib/aws-certificatemanager/dns-validated-certificate-handler/index.js#L160