ComputeCanada / magic_castle

Terraform modules to replicate the HPC user experience in the cloud
MIT License
130 stars 38 forks source link

LetsEncrypt failure for GCP DNS #39

Closed verdurin closed 4 years ago

verdurin commented 4 years ago

I see this error with a newly-created domain registered with Google Cloud DNS:

module.dns.module.acme.acme_certificate.certificate: Creating...                      
module.dns.google_dns_record_set.records[1]: Creating...                                                               
module.dns.google_dns_record_set.records[6]: Creating...                                                               
module.dns.google_dns_record_set.records[5]: Creating...                                                               
module.dns.google_dns_record_set.records[4]: Creating...                                                               
module.dns.google_dns_record_set.records[3]: Creating...                                                               
module.dns.google_dns_record_set.records[2]: Creating...
module.dns.google_dns_record_set.records[0]: Creating...                                                               
module.gcp.google_compute_instance.mgmt[0]: Modifying... [id=testcluster0-mgmt1]                                                                                                                                                              
module.dns.google_dns_record_set.records[2]: Creation complete after 2s [id=analytics/jupyter.testcluster0.bdi-nno-rco.com./A]                                       
module.dns.google_dns_record_set.records[5]: Creation complete after 2s [id=analytics/testcluster0.bdi-nno-rco.com./SSHFP]
module.dns.google_dns_record_set.records[1]: Creation complete after 5s [id=analytics/login1.testcluster0.bdi-nno-rco.com./A]
module.dns.google_dns_record_set.records[6]: Creation complete after 5s [id=analytics/login1.testcluster0.bdi-nno-rco.com./SSHFP]
module.dns.google_dns_record_set.records[0]: Creation complete after 5s [id=analytics/testcluster0.bdi-nno-rco.com./A]
module.dns.google_dns_record_set.records[4]: Creation complete after 7s [id=analytics/dtn.testcluster0.bdi-nno-rco.com./A]
module.dns.google_dns_record_set.records[3]: Creation complete after 8s [id=analytics/ipa.testcluster0.bdi-nno-rco.com./A]
module.dns.module.acme.acme_certificate.certificate: Still creating... [10s elapsed]                                                                                                                                                          
module.gcp.google_compute_instance.mgmt[0]: Still modifying... [id=testcluster0-mgmt1, 10s elapsed]                                                                                                                                           
module.gcp.google_compute_instance.mgmt[0]: Modifications complete after 11s [id=testcluster0-mgmt1]
module.dns.module.acme.acme_certificate.certificate: Still creating... [20s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [30s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [40s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [50s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [1m0s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [1m10s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [1m20s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [1m30s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [1m40s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [1m50s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [2m0s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [2m10s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [2m20s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [2m30s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [2m40s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [2m50s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [3m0s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [3m10s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [3m20s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [3m30s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [3m40s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [3m50s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [4m0s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [4m10s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [4m20s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [4m30s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [4m40s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [4m50s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [5m0s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [5m10s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [5m20s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [5m30s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [5m40s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [5m50s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [6m0s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [6m10s elapsed]
module.dns.module.acme.acme_certificate.certificate: Still creating... [6m20s elapsed]

Error: error creating certificate: acme: Error -> One or more domains had a problem:
[*.testcluster0.bdi-nno-rco.com] time limit exceeded: last error: NS ns-cloud-e1.googledomains.com. returned NXDOMAIN for _acme-challenge.testcluster0.bdi-nno-rco.com.
[testcluster0.bdi-nno-rco.com] time limit exceeded: last error: NS ns-cloud-e2.googledomains.com. returned NXDOMAIN for _acme-challenge.testcluster0.bdi-nno-rco.com.

  on dns/acme/main.tf line 37, in resource "acme_certificate" "certificate":
  37: resource "acme_certificate" "certificate" {
cmd-ntrf commented 4 years ago

The ACME module uses a DNS challenge to confirm the domain name ownership and then generates the SSL certificates (https://letsencrypt.org/docs/challenge-types/). The process fails from time to time, but there is not much we can do on the Magic Castle side of things, apart from documenting what to do when this happens and why it could happen.

If the domain name was recently transferred to Google DNS, there might be some delay with the nameservers update?

You should be able to call terraform apply again and eventually the certificates generation will work.

verdurin commented 4 years ago

Yes, I do realise this might not be a Magic Castle problem. Having tried a couple of times, it looks like it might be a Google DNS problem, because when I lookup the domain it returns ns-cloud-e1.googledomains.com. while inside the domain management interface it reports that we're using ns-cloud-c1.googledomains.com.. I suspect this is because I only this evening created both the domain and later the zone in Cloud DNS,

This has worked for me before, but I probably had left a bit more time after creating the domain and the zone.

Will try again tomorrow and close this ticket if it's working by then.

verdurin commented 4 years ago

This does appear to work if the Google DNS nameserver entries are manually edited to match those in the Google Cloud DNS zone. One might have expected them to be the same automatically, but alas not.