Open snowie2000 opened 6 years ago
/cc @bradfitz @FiloSottile
It seems like this ignores the case of certificate renewal, which is baked into autocert.Manager
If you don't have the correct system time, then all bets are off as to whether autocert.Manager
will correctly renew the certificate before it expires (default 30 days before expiration).
Example: System time is 45 days behind current time. You pull a new certificate from LetsEncrypt, and you don't check NotBefore
time on the certificate. 91 days pass, at which time the certificate is expired however according to the server, you're still 46 days away from needing a renewal, so you continue to serve an expired certificate.
Maybe this is an extreme example (although I've seen system times that are years off in the wild), but it seems that renewal depends on accurate system time. If we don't check NotBefore, then we wind up with hard to predict behaviour that depends on the amount of system time drift and what Manager.RenewBefore
is set to.
If you mean the renewal time, I think, checking the expiry date is enough. After all your logic is renew the certificate 30days before the end of the expiry date, Not renew after 60 days from the beginning of issue date. A server with a time drift 45 days is an extreme case, but server with a time drift of several minutes is common.
LetsEncrypt backdates their certificates by 1 hour, so time drift of several minutes should not be an issue regardless - https://community.letsencrypt.org/t/issued-cert-date-shows-gmt-1-instead-of-gmt/12832
I haven't tested, but I would assume other ACME based CA's do the same.
Okay, I agree that 1 hour back should mitigate some of the problems, but I still believe that the side-effect of the validation of the certs beginning time is too severe.
Let's think about the scenario below: If a server has a time drift of, let's say, 45 days, it only loses its ability to renew the cert in time. Once the maintainer finds the problem (after the cert expired at worse), he can simply correct the time and renew the cert. However, if we verify the date as it is, autocert will try to issue the cert, and discard the cert (since it thinks it's not valid), and do it again and again until it runs out of chance. Now the maintainer is facing a worse problem that he has no cert to use because the rate limit of issuing certs for a specific domain only reset every 7 days.
In another case, a domain successfully issued a cert, and for some reason, its date drifted for 1 day or so. Autocert will try to renew the cert 29 days before expiration, and of course, failed. Since the current cert is still valid, the maintainer is very likely unaware of the problem. After 29days, the cert expired, and the maintainer will find out that he is running out of rate, and can't renew his cert.
I think if there is no big drawback (which needs careful investigation) if we don't verify the certs' beginning time, I suggest just don't do it since it makes less trouble.
Certificate's "Not Before" is checked in multiple places. Another one not mentioned yet here is when a certificate is fetched from cache, to decide whether the cached cert is usable or a new one needs to requested.
Why wouldn't the maintainer want to ensure correct system time? If the system time is drifted so much as described here, I wouldn't be surprised if it had bigger problems than a certificate renewal.
When you fetch a cert from the cache, you should check whether the cert is expired or not rather than checking if the cert is not yet valid since I believe ACME based CAs would never issue a cert that only becomes valid in the future.
It is maintainer's responsibility to keep the time synced, but hey, everyone makes mistakes, and some of the applications aren't time sensitive. As said by @mdp, there are systems that have a time drift more than a year.
ACME based CAs would never issue a cert that only becomes valid in the future.
If there were so much trust and reliability around, we probably wouldn't need certificates at all, nor TLS or crypto for that matter.
If they issued such a faulty cert, reissue it on the next request doesn't fix it either. Plus, CAs are basically much more trustful than average website maintainers, that's why they have an "authority" in their name.
This sounds like a problem for the blockchain ;) /s
I'm not sure the library should be trying to solve this problem, but I see the pain here. If autocert
keep's requesting a new certificate due to significant time drift (>1 hour), then after 5 requests you're locked out for a week.
I believe there are two cases where an error can occur while requesting a certificate:
autocert
In the first case, retrying makes sense as it may correct itself, and the penalty is low (1 hour rate limiting).
In the second case, retrying is extremely unlikely to solve the problem, and the penalty is quite high (1 week rate limiting after 5 tries/5 minutes). I'd argue that in this case, it should be regarded as a critical error and autocert
should simply stop retrying.
Another option would be a very aggressive backoff on retrying.
@mdp Generally, I agree with you except that I don't think any sort of backoff can help in the latter cases.
Because in my understanding, the backoff controls how autocert
should retry when a request to ACME server is failed. However, in this case, everything related to ACME server is fully working, and autocert
got the cert successfully, it's discarded afterward. So an aggressive backoff shouldn't help much (if any).
I agree hitting rate limits is a real issue, but autocert should not operate in unsound circumstances. I think causing a fatal error as @mdp suggests would be the best solution.
@FiloSottile I agree. Causing a fatal error or panic when failed to verify the beginning date of a newly issued cert could be a good enough solution.
I disagree about fatal error. It would be unusual at best. How many programs exit with an error due to system time drift. NTP? SSH? net/http.Server?
Besides, autocert is just a package. Panic or os.Exit will cause the whole program importing and using the package to exit. Not the best practice.
That's what I said time insensitive service. These services don't care about time drift, so it is totally possible that the maintainer didn't notice the time drift issue only after he had lost his cert as well as the chance to renew the cert.
Adding a doc comment to the autocert.Manager that the users should ensure correct system time is probably the best solution to this.
The package is for developers not end users. Developers have no way to guarantee the correction of system time. It's the server owner's responsibility to do it. Better to document the panic and ask developers to properly recover it and warn the end user is possibly a more practical way.
package users = developers
If you're writing a program for end users to be run on their servers, there are multiple ways to compare current server time with what the consensus is outside the world of that server, and act accordingly.
The autocert package isn't the right place for that imho.
Sorry, when I wrote "critical" error, I just meant that autocert could stop retrying, not that it should panic or exit.
I agree that it's not the packages job to ensure correct system time. However, after a certificate shows up from the future, it seems unlikely that retrying will do anything other than exhausting the certificates and wasting resources of the cert provider. The end result will be the same; you'll eventually wind up with an expired certificate that can't be renewed.
I noticed that there's a placeholder for exponential backoff on retries. One option, in the case of a future or already-expired certs, is to keep retrying, but at a greater interval. Say 2 days, which puts you outside of the scope of a week-long rate limit (and maybe the system time gets corrected?).
Surely setting up server time to auto-sync once is easier than custom backoff function implementation + periodically logging in and checking/correcting system time manually or rebooting the server in the hopes the clock will somehow fix itself?
Add to that more thoughts:
I could probably come up with other scenarious.
Those items are not all related to this particular issue but my point is a system with such a clock drift will have many other problems without automatic time sync. The autocert package won't magically fix those broken servers. If we add code to "fix" this particular issue, we'll be piling up more and more code trying to "fix" other things like what I mentioned above instead of the root cause.
Alright, now I have changed my mind and agreed that the verification of the beginning and the ending time of a cert is necessary. However, to keep retrying even after knowing it can change nothing but exhaust the chance of renewal is really something we shouldn't do and could change. We can't sync the time for users, but at least we should not make things worse.
autocert.go line 1095 compares the current time and the time the cert is issued to check if the cert is valid. But the time of the server is not always 100% accurate, and if the server time is behind the acme server time (i.e. the real time), autocert judges the cert as not valid, and will request for another cert on the next request which will hit the rate limit in a very short time.
What did you expect to see?
Verify the beginning of the time of the cert is not necessary. Only the expiry date is needed to be verified. Since the valid duration of an acme cert is typically 3 months long, I don't think any server will have such a huge time difference.
What did you see instead?
Cert judged as not valid. More cert requests will be sent subsequently (they will be judged as invalid too) and hits the rate limit. No cert can be successfully issued until the server time is corrected.