Open dhduvall opened 5 years ago
@dhduvall Are you still having this issue? Curious if you think its related to: https://github.com/go-acme/lego/issues/1087
It might be: lego could be seeing success after those 50 seconds, based on the individual servers it happens to have gotten to respond to its request, but when LE checks, it ends up with servers that don't have the new data yet, and ends up failing the request.
That said, IIRC, I rarely saw lego think the propagation was complete before the timeout. Maybe GCP's architecture is a bit different now, that this is happening?
I haven't seen it myself since I added the workaround with WrapPreCheck()
.
And I'm still mystified as to why people don't see this when running certbot
against GCP. Maybe most people aren't doing the DNS verification.
I've filed an issue with Google (https://issuetracker.google.com/issues/123397631) but lego probably needs a workaround for the problem. The summary is that even once all of a domain's nameservers have responded with the correct data (and thus triggering a successful result of lego's pre-check routines), one or more of the nameservers may revert to responding with old data or NXDOMAIN. They eventually settle down after a (potentially unbounded?) amount of time.
I'm not sure what the best way is of adding extra time to the pre-check method. Because
checkDNSPropagation()
isn't exported, I can't simply create a pre-check function that calls it first, then waits (or continues to check for a while). Simply trying again isn't a great option, since there's no way to deactivate the authorization from this side of the API, as (as best I can tell) there's no access to the authorization URI in the error you get back fromObtain()
and no way to create it from theObtainRequest
, and I accidentally ended up maxing out the authorizations rate limit figuring this out (thankfully I kept the logs containing the URIs).I'm happy to put a fix together, but would appreciate some direction.