caddyserver / certmagic

Automatic HTTPS for any Go program: fully-managed TLS certificate issuance and renewal
https://pkg.go.dev/github.com/caddyserver/certmagic?tab=doc
Apache License 2.0
5k stars 289 forks source link

Initial implementation of ARI #286

Closed mholt closed 4 months ago

mholt commented 5 months ago

Add ARI (ACME Renewal Information) support.

ARI is currently in draft spec and subject to change: https://datatracker.ietf.org/doc/draft-ietf-acme-ari/03/

For certificates issued with ACME, if the ACME server supports it, CertMagic now considers ARI to trigger renewals.

As usual, the maintenance timer keeps certificates renewed, but now we also refresh ARI if it is time to poll again, and we also use ARI as a determining factor for triggering renewal in addition to the expiration date / lifetime. By default, maintenance routines run every 10 minutes so we will likely not miss an ARI change.

We also store ARI along with the certificate and its other metadata. This means that multiple Caddy instances can share the same ARI data. While we don't strictly synchronize the updating of ARI, it is merely an HTTP request, not a whole transaction with strict CA rate limits, so I don't think we need the complexity there. It is theoretically possible that two instances will update ARI at the same time and both write it to storage, but there's really no harm in that.

We conform to ARI spec recommendations and standards, to the best of my knowledge. Thanks to @aarongable for answering some questions there (though one is still open; not urgent though).

We will ignore ARI if a certificate is dangerously close to the end of its lifetime (last 5% currently) to ensure that buggy ARI implementations on either side do not put the site's availability at risk. Once a certificate is extremely close to expiring, we will renew no matter what ARI says.

I'm still testing this change and finishing it up, but I expect it won't be too much longer before it goes out.

TODO:

mholt commented 5 months ago

@aarongable @beautifulentropy Would you / anyone from Let's Encrypt be interested in doing a quick once-over to maybe verify the various client aspects of ARI that you're most concerned about as a CA?

mholt commented 4 months ago

Latest commit ensures CA is the same before setting Replaces on the ACME order.

We also don't set Replaces on the 2nd retry (3rd attempt) of ACME transactions. This is in case the reason for the failed order is a rejected Replaces value. I chose 2nd retry in case the first failure is maybe sporadic, the first retry will fix it. 2nd retry is early on in the process but then if that fails too, then we know it's not related to ARI, so we can just keep retrying normally after that and still gain the benefits of updating the server state when it eventually does succeed.

I've also complexified the logic for deciding whether a certificate needs renewal. Before this PR, we just checked the expiration date, the current time, and the configured lifetime ratio. Now this change considers ARI if it is available, but the logic is nuanced to ensure maximum reliability and flexiblity:

Still need to integrate with on-demand TLS and log when we detect a change to the ARI window.

mholt commented 4 months ago

The latest commit now checks ARI with on-demand TLS and when loading certs from storage (rather than just checking expiration date).

I've also added logging when a window change is detected, including the explanation URL.

I've also ensured that any updated ARI will trigger a renewal of the certificate if the new ARI window means that the certificate is to be renewed immediately.

After testing several scenarios with this code, I'm confident enough to merge it in, but I want more testing in the field before it gets released in a Caddy 2.8 production release.

mholt commented 4 months ago

I'm going to merge this so I can get it some practice in the field, but feedback/review is still welcomed if anyone desires to; we can still make fixes in follow-up commits/PRs.