x/crypto/acme/autocert: verify the beginning time of an issued cert is not necessary

snowie2000 commented 6 years ago

autocert.go line 1095 compares the current time and the time the cert is issued to check if the cert is valid. But the time of the server is not always 100% accurate, and if the server time is behind the acme server time (i.e. the real time), autocert judges the cert as not valid, and will request for another cert on the next request which will hit the rate limit in a very short time.

What did you expect to see?

Verify the beginning of the time of the cert is not necessary. Only the expiry date is needed to be verified. Since the valid duration of an acme cert is typically 3 months long, I don't think any server will have such a huge time difference.

What did you see instead?

Cert judged as not valid. More cert requests will be sent subsequently (they will be judged as invalid too) and hits the rate limit. No cert can be successfully issued until the server time is corrected.

agnivade commented 6 years ago

/cc @bradfitz @FiloSottile

mdp commented 5 years ago

It seems like this ignores the case of certificate renewal, which is baked into autocert.Manager

If you don't have the correct system time, then all bets are off as to whether autocert.Manager will correctly renew the certificate before it expires (default 30 days before expiration).

Example: System time is 45 days behind current time. You pull a new certificate from LetsEncrypt, and you don't check NotBefore time on the certificate. 91 days pass, at which time the certificate is expired however according to the server, you're still 46 days away from needing a renewal, so you continue to serve an expired certificate.

Maybe this is an extreme example (although I've seen system times that are years off in the wild), but it seems that renewal depends on accurate system time. If we don't check NotBefore, then we wind up with hard to predict behaviour that depends on the amount of system time drift and what Manager.RenewBefore is set to.

snowie2000 commented 5 years ago

If you mean the renewal time, I think, checking the expiry date is enough. After all your logic is renew the certificate 30days before the end of the expiry date, Not renew after 60 days from the beginning of issue date. A server with a time drift 45 days is an extreme case, but server with a time drift of several minutes is common.

mdp commented 5 years ago

LetsEncrypt backdates their certificates by 1 hour, so time drift of several minutes should not be an issue regardless - https://community.letsencrypt.org/t/issued-cert-date-shows-gmt-1-instead-of-gmt/12832

I haven't tested, but I would assume other ACME based CA's do the same.

snowie2000 commented 5 years ago

Okay, I agree that 1 hour back should mitigate some of the problems, but I still believe that the side-effect of the validation of the certs beginning time is too severe.

Let's think about the scenario below: If a server has a time drift of, let's say, 45 days, it only loses its ability to renew the cert in time. Once the maintainer finds the problem (after the cert expired at worse), he can simply correct the time and renew the cert. However, if we verify the date as it is, autocert will try to issue the cert, and discard the cert (since it thinks it's not valid), and do it again and again until it runs out of chance. Now the maintainer is facing a worse problem that he has no cert to use because the rate limit of issuing certs for a specific domain only reset every 7 days.

In another case, a domain successfully issued a cert, and for some reason, its date drifted for 1 day or so. Autocert will try to renew the cert 29 days before expiration, and of course, failed. Since the current cert is still valid, the maintainer is very likely unaware of the problem. After 29days, the cert expired, and the maintainer will find out that he is running out of rate, and can't renew his cert.

I think if there is no big drawback (which needs careful investigation) if we don't verify the certs' beginning time, I suggest just don't do it since it makes less trouble.

x1ddos commented 5 years ago

Certificate's "Not Before" is checked in multiple places. Another one not mentioned yet here is when a certificate is fetched from cache, to decide whether the cached cert is usable or a new one needs to requested.

Why wouldn't the maintainer want to ensure correct system time? If the system time is drifted so much as described here, I wouldn't be surprised if it had bigger problems than a certificate renewal.

snowie2000 commented 5 years ago

When you fetch a cert from the cache, you should check whether the cert is expired or not rather than checking if the cert is not yet valid since I believe ACME based CAs would never issue a cert that only becomes valid in the future.

It is maintainer's responsibility to keep the time synced, but hey, everyone makes mistakes, and some of the applications aren't time sensitive. As said by @mdp, there are systems that have a time drift more than a year.

x1ddos commented 5 years ago

ACME based CAs would never issue a cert that only becomes valid in the future.

If there were so much trust and reliability around, we probably wouldn't need certificates at all, nor TLS or crypto for that matter.

snowie2000 commented 5 years ago

If they issued such a faulty cert, reissue it on the next request doesn't fix it either. Plus, CAs are basically much more trustful than average website maintainers, that's why they have an "authority" in their name.

mdp commented 5 years ago

This sounds like a problem for the blockchain ;) /s

I'm not sure the library should be trying to solve this problem, but I see the pain here. If autocert keep's requesting a new certificate due to significant time drift (>1 hour), then after 5 requests you're locked out for a week.

I believe there are two cases where an error can occur while requesting a certificate:

The certificate request fails (can't authorize or has a network issue)
The certificate request succeeds, but it is then recognized as invalid by autocert

In the first case, retrying makes sense as it may correct itself, and the penalty is low (1 hour rate limiting). In the second case, retrying is extremely unlikely to solve the problem, and the penalty is quite high (1 week rate limiting after 5 tries/5 minutes). I'd argue that in this case, it should be regarded as a critical error and autocert should simply stop retrying.

Another option would be a very aggressive backoff on retrying.

snowie2000 commented 5 years ago

@mdp Generally, I agree with you except that I don't think any sort of backoff can help in the latter cases.

Because in my understanding, the backoff controls how autocert should retry when a request to ACME server is failed. However, in this case, everything related to ACME server is fully working, and autocert got the cert successfully, it's discarded afterward. So an aggressive backoff shouldn't help much (if any).

FiloSottile commented 5 years ago

I agree hitting rate limits is a real issue, but autocert should not operate in unsound circumstances. I think causing a fatal error as @mdp suggests would be the best solution.

snowie2000 commented 5 years ago

@FiloSottile I agree. Causing a fatal error or panic when failed to verify the beginning date of a newly issued cert could be a good enough solution.

x1ddos commented 5 years ago

I disagree about fatal error. It would be unusual at best. How many programs exit with an error due to system time drift. NTP? SSH? net/http.Server?

x1ddos commented 5 years ago

Besides, autocert is just a package. Panic or os.Exit will cause the whole program importing and using the package to exit. Not the best practice.

snowie2000 commented 5 years ago

That's what I said time insensitive service. These services don't care about time drift, so it is totally possible that the maintainer didn't notice the time drift issue only after he had lost his cert as well as the chance to renew the cert.

x1ddos commented 5 years ago

Adding a doc comment to the autocert.Manager that the users should ensure correct system time is probably the best solution to this.

snowie2000 commented 5 years ago

The package is for developers not end users. Developers have no way to guarantee the correction of system time. It's the server owner's responsibility to do it. Better to document the panic and ask developers to properly recover it and warn the end user is possibly a more practical way.

x1ddos commented 5 years ago

package users = developers

If you're writing a program for end users to be run on their servers, there are multiple ways to compare current server time with what the consensus is outside the world of that server, and act accordingly.

The autocert package isn't the right place for that imho.

mdp commented 5 years ago

Sorry, when I wrote "critical" error, I just meant that autocert could stop retrying, not that it should panic or exit.

I agree that it's not the packages job to ensure correct system time. However, after a certificate shows up from the future, it seems unlikely that retrying will do anything other than exhausting the certificates and wasting resources of the cert provider. The end result will be the same; you'll eventually wind up with an expired certificate that can't be renewed.

I noticed that there's a placeholder for exponential backoff on retries. One option, in the case of a future or already-expired certs, is to keep retrying, but at a greater interval. Say 2 days, which puts you outside of the scope of a week-long rate limit (and maybe the system time gets corrected?).

x1ddos commented 5 years ago

Surely setting up server time to auto-sync once is easier than custom backoff function implementation + periodically logging in and checking/correcting system time manually or rebooting the server in the hopes the clock will somehow fix itself?

Add to that more thoughts:

should we remove check for NotAfter as well? it would follow the same logc: trusting the CA
should we remove leaf.VerifyHostname? MitM or misissuance; very rare but happens
or should we remove the whole validCert function because it'll lead to the same issue if any of the checks fails
what about MitM where I get a hold of an expired CA cert and impersonate the CA endpoint feeding the autocert something else which it'll happily accept because the server is running in the past so an expired cert would still be valid for that device? not as impossible as one might think
if not removing the above checks... then implementing custom backoff logic for all those specific cases? if so, based on what reasoning
aside from renewal, first cert issuance will have the same problem: given a sufficient amount of QPS, you'll run out of the same quota anyway

I could probably come up with other scenarious.

Those items are not all related to this particular issue but my point is a system with such a clock drift will have many other problems without automatic time sync. The autocert package won't magically fix those broken servers. If we add code to "fix" this particular issue, we'll be piling up more and more code trying to "fix" other things like what I mentioned above instead of the root cause.

snowie2000 commented 5 years ago

Alright, now I have changed my mind and agreed that the verification of the beginning and the ending time of a cert is necessary. However, to keep retrying even after knowing it can change nothing but exhaust the chance of renewal is really something we shouldn't do and could change. We can't sync the time for users, but at least we should not make things worse.

golang / go

x/crypto/acme/autocert: verify the beginning time of an issued cert is not necessary #28201

What did you expect to see?

What did you see instead?