go-acme / lego

Let's Encrypt/ACME client and library written in Go
https://go-acme.github.io/lego/
MIT License
7.99k stars 1.02k forks source link

Wildcard TXT record not detected with NS1 provider #653

Closed idcmp closed 6 years ago

idcmp commented 6 years ago

I have a hosted zone called foo.example.org, and I'm registering a wildcard of *.my.foo.example.org.

With the last tagged version (1.0.1 I believe), this fails as lego (through the terraform-provider-acme plugin) is looking for a zone called my.foo.example.org.

Building the TF provider with master of lego, the TXT record of _acme-challenge.my.foo.example.org gets created (with content), however TF times out with:

[my.foo.example.org] time limit exceeded: last error: NS dns2.p06.nsone.net. did not return the expected TXT record

However, when I dig for the record, I see it's there:

$ dig  _acme-challenge.my.foo.example.org. TXT @dns2.p06.nsone.net.

;; ANSWER SECTION:
_acme-challenge.my.foo.example.org. 96  IN TXT  "bF7qBuk895ogenhlcOd9BhEamfsYcoabNCbRnWWk5EG"
ldez commented 6 years ago

@idcmp Hi, could you try with this branch https://github.com/xenolf/lego/tree/fix/ns1-wildcard?

idcmp commented 6 years ago

I've changed Gopkg.toml to use branch = "fix/ns1-wildcard", reran dep ensure and rebuilt terraform-provider-acme and reinstalled the provider in place. I'm still getting the same error message:

acme_registration.reg: Creation complete after 1s (ID: https://acme-staging-v02.api.letsencrypt.org/acme/acct/7034812)
acme_certificate.certificate: Creating...
  account_key_pem:                             "<sensitive>" => "<sensitive>"
  account_ref:                                 "" => "<computed>"
  certificate_domain:                          "" => "<computed>"
  certificate_pem:                             "" => "<computed>"
  certificate_url:                             "" => "<computed>"
  common_name:                                 "" => "*.my.foo.example.org"
  dns_challenge.#:                             "" => "1"
  dns_challenge.1941026193.config.%:           "" => "1"
  dns_challenge.1941026193.config.NS1_API_KEY: "<sensitive>" => "<sensitive>"
  dns_challenge.1941026193.provider:           "" => "ns1"
  issuer_pem:                                  "" => "<computed>"
  key_type:                                    "" => "2048"
  min_days_remaining:                          "" => "7"
  must_staple:                                 "" => "false"
  private_key_pem:                             "<sensitive>" => "<sensitive>"
acme_certificate.certificate: Still creating... (10s elapsed)
acme_certificate.certificate: Still creating... (20s elapsed)
acme_certificate.certificate: Still creating... (30s elapsed)
acme_certificate.certificate: Still creating... (40s elapsed)
acme_certificate.certificate: Still creating... (50s elapsed)
acme_certificate.certificate: Still creating... (1m0s elapsed)

Error: Error applying plan:

1 error(s) occurred:

* acme_certificate.certificate: 1 error(s) occurred:

* acme_certificate.certificate: error creating certificate: acme: Error -> One or more domains had a problem:
[my.foo.example.org] time limit exceeded: last error: NS dns4.p06.nsone.net. did not return the expected TXT record
idcmp commented 6 years ago

(The TXT record is still being created in the correct zone in NS1.)

ldez commented 6 years ago

@idcmp I added logs to diagnose the problem, could you pull the branch and retry?

idcmp commented 6 years ago

Hey @ldez - I've had to switch to calling the lego binary directly (Terraform isn't letting me log anything :-/ ). Here's what's I've got:

NS1_API_KEY is set properly

$ ./lego --domains "*.my.foo.example.org" --email "..." --dns ns1 run

2018/10/01 15:34:58 [INFO] [*.my.foo.example.org] acme: Obtaining bundled SAN certificate
2018/10/01 15:34:59 [INFO] [*.my.foo.example.org] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz/8u5J6vbXunmfBm8fqzVCTCR501loyL_S2_8HPlz6hX
2018/10/01 15:34:59 [INFO] [my.foo.example.org] acme: Preparing to solve DNS-01
2018/10/01 15:35:00 create a new record for [zone: foo.example.org, fqdn: _acme-challenge.my.foo.example.org., domain: my.foo.example.org, v: SOl-]
2018/10/01 15:35:00 Could not obtain certificates
    acme: Error -> One or more domains had a problem:
[my.foo.example.org] error presenting token: ns1: failed to create record [zone: "foo.example.org", fqdn: "_acme-challenge.my.foo.example.org."]: PUT https://api.nsone.net/v1/zones/foo.example.org/_acme-challenge.my.foo.example.org/TXT: 400 Input validation failed (Value None for field '<obj>.filters' is not of type array)

The TXT record is not created (or it's immediately removed silently).

ldez commented 6 years ago

It's weird because you got a new error, but the code is the same :thinking:

Have you changed something?

PS: I recommend you to use the LE staging (https://acme-staging-v02.api.letsencrypt.org) instead of the LE production (https://acme-v02.api.letsencrypt.org) for testing.

idcmp commented 6 years ago

Hey @ldez, before, I was running through Terraform, using the terraform-provider-acme provider. I switched to using the lego command line tool for this time since TF was swallowing all the logging (and so there's fewer moving parts for troubleshooting).

Would it be helpful if I reran ./lego on master and the previous version of fix/ns1-wildcard again and let you know? The error 400 Input validation failed (Value None for field '<obj>.filters' is not of type array) seems unusual.

ldez commented 6 years ago

It would be very helpful, if you pull master, the previous version of my branch and the HEAD my branch. Don't forget to fetch. Thank you for your help.

idcmp commented 6 years ago

Hey @ldez, poking around, the error 400 Input validation failed (Value None for field '<obj>.filters' is not of type array) is coming because Go is serializing the filters field with the value "null" in the JSON when the filter is empty. I hacked ~/go/src/github.com/xenolf/lego/vendor/gopkg.in/ns1/ns1-go.v2/rest/model/dns/record.go to add omitempty and this seems to fix the problem.

I think if you called https://github.com/ns1/ns1-go/blob/v2/rest/model/dns/record.go#L37 in ns1.go's newTxtRecord, you would be okay.

Now off of master I get:

2018/10/01 20:04:28 [WARN] Error cleaning up my.foo.example.org: ns1: failed to delete record [zone: "foo.example.org", domain: "_acme-challenge.my.foo.example.org"]: <nil>

but refreshing the NS1 console, the record is actually deleted.

Interestingly, the vendored copy of ns1/ns1-go.v2 in terraform-provider-acme has the omitempty in it. :-/

ldez commented 6 years ago

could you pull and retry with my branch?

idcmp commented 6 years ago

Reran lego (against staging), after wiping my .lego directory away. It works! Thank you!

018/10/02 12:38:45 [INFO] [*.my.foo.example.org] acme: Obtaining bundled SAN certificate
2018/10/02 12:38:45 [INFO] [*.my.foo.example.org] AuthURL: https://acme-staging-v02.api.letsencrypt.org/acme/authz/6m2pBPpqq5ISMdWEhJSVZHnJw0AedtMnAVyJfqflhKY
2018/10/02 12:38:45 [INFO] [my.foo.example.org] acme: Preparing to solve DNS-01
2018/10/02 12:38:48 create a new record for [zone: foo.example.org, fqdn: _acme-challenge.my.foo.example.org., domain: my.foo.example.org, v: TvOM]
2018/10/02 12:38:50
2018/10/02 12:38:50 [INFO] [my.foo.example.org] acme: Trying to solve DNS-01
2018/10/02 12:38:50 [INFO] [my.foo.example.org] Checking DNS record propagation using [8.8.8.8:53]
2018/10/02 12:38:50 [INFO] Wait [timeout: 1m0s, interval: 2s]
2018/10/02 12:38:50 checkDNSPropagation: [fqdn: _acme-challenge.my.foo.example.org., v: TvOM]
2018/10/02 12:38:51 checkAuthoritativeNss: [fqdn: _acme-challenge.my.foo.example.org., v: TvOM]
2018/10/02 12:38:51 checkAuthoritativeNss Answer: [fqdn: _acme-challenge.my.foo.example.org., v: TvOM]%!(EXTRA string=TvOM)
2018/10/02 12:38:51 checkAuthoritativeNss: [fqdn: _acme-challenge.my.foo.example.org., v: TvOM]
2018/10/02 12:38:51 checkAuthoritativeNss Answer: [fqdn: _acme-challenge.my.foo.example.org., v: TvOM]%!(EXTRA string=TvOM)
2018/10/02 12:38:51 checkAuthoritativeNss: [fqdn: _acme-challenge.my.foo.example.org., v: TvOM]
2018/10/02 12:38:51 checkAuthoritativeNss Answer: [fqdn: _acme-challenge.my.foo.example.org., v: TvOM]%!(EXTRA string=TvOM)
2018/10/02 12:38:51 checkAuthoritativeNss: [fqdn: _acme-challenge.my.foo.example.org., v: TvOM]
2018/10/02 12:38:51 checkAuthoritativeNss Answer: [fqdn: _acme-challenge.my.foo.example.org., v: TvOM]%!(EXTRA string=TvOM)
2018/10/02 12:38:56 [INFO] [my.foo.example.org] The server validated our request
2018/10/02 12:38:59 [WARN] Error cleaning up my.foo.example.org: ns1: failed to delete record [zone: "foo.example.org", domain: "_acme-challenge.my.foo.example.org"]: <nil>
2018/10/02 12:38:59 [INFO] [*.my.foo.example.org] acme: Validations succeeded; requesting certificates
2018/10/02 12:39:00 [INFO] [*.my.foo.example.org] Server responded with a certificate.

Judging from the logging, checkAuthoritativeNss is missing a variable (or something).

ldez commented 6 years ago

Great :tada:

I solved the problem with the cleanup, you can pull my branch to validate if everything is working as expected now.