Closed linsomniac closed 1 month ago
Hello,
From my memory, certbot doesn't check the propagation.
The current algo:
The propagation check (Wait for propagation
) is here because some DNS providers are slow to propagate.
The Wait for route53
is required because route53 doesn't apply changes immediately, if we don't check that we will add and remove a record simultaneously.
You can try to disable the propagation check (Wait for propagation
) with --dns.disable-cp
flag.
I did a run with "--dns.disable-cp" last night and it took basically the same length of time (15:57 to 17:15, 78 minutes).
Maybe my thinking is unrealistic, but it seems like it should be able to be done faster, 30 seconds per domain to update DNS seems pretty long, but the primary thing is that it's reliable, which it does seem to be. Usually it just runs from cron, so it's not even visible, but if we do a full respin, especially in an emergency, that would be an instance where faster would be nice.
So maybe the slow part should be the Wait for route53
:thinking:
Can you provide the full log?
Sent a full log to you on twitter/x
Based on the log, the slow part seems to be Wait for route53
:
2024/05/07 10:06:03 [INFO] Wait for route53 [timeout: 2m0s, interval: 1s]
2024/05/07 10:06:45 [INFO] [a.example.com] acme: Preparing to solve DNS-01
2024/05/07 10:06:46 [INFO] Wait for route53 [timeout: 2m0s, interval: 1s]
2024/05/07 10:07:24 [INFO] [b.example.com] acme: Preparing to solve DNS-01
2024/05/07 10:07:25 [INFO] Wait for route53 [timeout: 2m0s, interval: 1s]
2024/05/07 10:08:05 [INFO] [c.example.com] acme: Preparing to solve DNS-01
2024/05/07 10:27:52 [INFO] [a.example.com] acme: Cleaning DNS-01 challenge
2024/05/07 10:27:53 [INFO] Wait for route53 [timeout: 2m0s, interval: 1s]
2024/05/07 10:28:28 [INFO] [b.example.com] acme: Cleaning DNS-01 challenge
2024/05/07 10:28:29 [INFO] Wait for route53 [timeout: 2m0s, interval: 1s]
2024/05/07 10:29:08 [INFO] [c.example.com] acme: Cleaning DNS-01 challenge
2024/05/07 10:29:09 [INFO] Wait for route53 [timeout: 2m0s, interval: 1s]
2024/05/07 10:30:03 [INFO] [d.example.com] acme: Cleaning DNS-01 challenge
I created a branch with a log at the end of the wait, just be sure. Can you try it?
I sent a link to the full log again on X, it looks like it is failing due to missing region. Let me see if I can fix that and run it again. I set the region to us-east-1 in the AWS_REGION environment variable. It's running now, I'll give you an update in an hour. :-)
I sent another link on X to a gist of the log output of running that branch.
Are you sure you're using my branch? Because End of wait for
logs are missing.
Sent you another one, I think this one is correctly built off that branch. Sorry about that.
The logs confirmed my idea: the slow part is the Wait for route53
.
2024/05/08 15:54:22 [INFO] [a.example.com] acme: Preparing to solve DNS-01
2024/05/08 15:54:23 [INFO] Wait for route53 [timeout: 2m0s, interval: 1s]
2024/05/08 15:55:01 [INFO] End of wait for route53 [timeout: 2m0s, interval: 1s]
2024/05/08 15:55:01 [INFO] [c.example.com] acme: Preparing to solve DNS-01
2024/05/08 15:55:02 [INFO] Wait for route53 [timeout: 2m0s, interval: 1s]
2024/05/08 15:55:40 [INFO] End of wait for route53 [timeout: 2m0s, interval: 1s]
2024/05/08 15:55:40 [INFO] [b.example.com] acme: Preparing to solve DNS-01
2024/05/08 15:55:41 [INFO] Wait for route53 [timeout: 2m0s, interval: 1s]
2024/05/08 15:56:19 [INFO] End of wait for route53 [timeout: 2m0s, interval: 1s]
The wait was introduced because of a bug https://github.com/go-acme/lego/issues/94#issuecomment-179504193 inside PR #97.
:thinking: maybe I can add an option to skip this part, but I don't know what will be the side effects.
I updated my branch.
Can you try (with my branch) to set the env var AWS_WAIT_FOR_RECORD_SETS_CHANGED
to false
?
To be validated the challenge requires having an available TXT record, this means that Let's Encrypt should be able to get this TXT record with a DNS call.
The wait for the changes of the record sets is useful because if the changes are not applied the record is unavailable.
Don't wait can be a problem, especially with domains in the same zone, because the route53 API requires posting all the records every time. If the first change is not applied, the second change will not use the right information.
There 3 wait strategies:
The DNS provider implementation inside lego works by domain without knowledge of the other domains, so it's not possible to group domains to call the route53 API.
The option AWS_WAIT_FOR_RECORD_SETS_CHANGED
can be used (to disable the wait for the changes), but I'm afraid that will create major side effects.
Ran the new version, sent the output, runtime was down under 14 minutes. Thanks for your attention on this, I had thought that it was a simple restructuring of the update/wait/verify logic, but it sounds like the DNS provider implementation doesn't lend to working in that way, which I understand. Thanks for explaining that. I'm going to close this ticket as it seems like there isn't a reliable solution, though there is an unreliable solution for use cases where that's ok.
Welcome
How do you use lego?
Binary
Detailed Description
I'm using the CLI and AWS route53 provider, on a certificate with 46 names, against the LetsEncrypt staging endpoint. It's taking 75 minutes to request the cert.
Looks like it loops over like this:
In my case, those domains are in 4-8 different zones
Previously to lego we were using certbot via http and it could create the certs in, by memory, a minute or less. I realize that HTTP is different from Route53.
It seems like this loop is going over each domain and adding the validation name, then waiting for propagation, and doing similar for removing the record. Is there a reason it does this, rather than looping over the domains, adding the TXT records for all of them, THEN looping over them checking for propagation (so they can all propagate in parallel), and similarly for the removal?