Open mcrute opened 2 years ago
cc @golang/security
cc @rolandshoemaker
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
This has significant cross-over with #40017 (and #22274), which is on-hold and still doesn't have a fully scoped proposal. Since both implementations are likely to share a significant amount of functionality, I think it makes more sense to focus on the crypto/x509 functionality, with an eye to how it can be extended more generally to a TLS server (vs. a verifying client).
Ideally this should probably be mostly implemented at the crypto/tls level, with the only real change for autocert being adding a bool to Manager
which sets something in the tls.Config
that enables fetching/stapling.
I don't see the overlap with #40017, as I read that it's primarily about client verification of OCSP and OCSP staples. This proposal is approaching it from the other side and would require a client to fetch an OCSP response but not verify it in the way a client would. I see some overlap in the solution space for #22274 and this issue though as both will require an OCSP client.
How would you feel about tackling these problems independently? In stage one autocert.Manager
can learn a StapleOCSP
flag which will do the stapling within autocert. This will necessitate the building of an OCSP client that can at least decode a certificate, fetch a response from the CA, and validate that the response is good before stapling to the cert. It would likely also require a background goroutine to watch the manager state and refresh staples as they're coming close to expiration (doing this out-of-band would have the least impact on end-users of a server using autocert in the case of a CA being slow or suffering an outage).
In stage two we can look into a solution for #22274 which would allow us to request the Must-Staple extension from the CA when we request certificates and then build out OCSP fetching within crypto/tls (using the client we built in stage one as the foundation for that fetching logic). In the conclusion of this phase we can drop our local implementation in favor of the crypto/tls implementation.
I expect phase two will take longer than phase one so this gets the functionality into the hands of autocert users quickly and gives us a test-case for OCSP stapling in the standard library that can inform our decisions about crypto/tls changes. Aside from the flag on autocert.Manager
the implementation should be entirely invisible to users so we don't have to make any breaking API changes when we later migrate to a platform-level solution in crypto/tls.
What do you think?
Since https://github.com/golang/go/issues/40017 and https://github.com/golang/go/issues/22274 are on hold, and since @rolandshoemaker said we should wait to address those first, it sounds like we should put this on hold too. Does anyone object?
I'm working on a plan to implement incrementally OCSP stapling, which will need to happen before there is a specific API change for x/crypto/acme/autocert, but I think in general the API proposal here is simply adding a bool to Manager
which would enable the behavior, which I'm not opposed to.
@mcrute Looking at the path to getting stapling functionality everywhere, I think this is a reasonable place to start. In general I think we just need a tls.Config.GetCertificate
function, which the manager can create, which spawns a goroutine that keeps the response up to date.
Since we're going to want to provide this functionality more broadly down the road though I think this should be written with an eye to being relatively generic, such that it can later be moved into the standard library (in particular the OCSP client logic should probably be separate from the stapling management logic, so that it could feasibly be reused for fetching responses for other purposes).
@rolandshoemaker based on my previous comment would it be worthwhile for me to create a PR that we can discuss more concretely for the manager work?
Agreed on the OCSP client point. Was thinking about introducing a Client
struct to x/crypto/ocsp
with a long term goal of that making it into the standard library to support your vision of stapling throughout the library. I've already written code to this end in my local workspace and could post that for review if you think it's worthwhile.
@rolandshoemaker It sounds like you have an idea about what API should be added to package autocert. What is that API precisely? Thanks.
@rsc I think the following API diff for autocert would be reasonable
type Manager struct {
...
// EnableOCSPStapling enables OCSP stapling for each certificate it obtains, returning
// a fresh OCSP response from the GetCertificate method.
EnableOCSPStapling bool
}
@mcrute that seems like a reasonable approach, if you'd like to send a CL tag me as a reviewer and we can work through what it should look like.
I don't see the overlap with #40017, as I read that it's primarily about client verification of OCSP and OCSP staples. This proposal is approaching it from the other side and would require a client to fetch an OCSP response but not verify it in the way a client would.
Unfortunately, OCSP servers operated by publicly-trusted CAs exhibit a variety of pathological behaviors and are not guaranteed to return valid responses. Therefore, TLS servers need to verify OCSP responses before stapling them, to avoid causing verification errors in clients. https://gist.github.com/sleevi/5efe9ef98961ecfb4da8 is a good writeup that describes what a robust OCSP stapling implementation needs to do.
@rolandshoemaker I'll start cleaning up my local patches and get them into a CL for discussion. Will probably take about a week because of some other obligations I need to attend to this week.
@AGWA thanks for that link, I've excerpted it below because I think there are bits worth talking about in considering this design.
- Support for keeping a long-lived (disk) cache of OCSP responses.
In the case of autocert.Manager the staples would be stored in application memory but they could be cached to disk. I'm not sure I see the value of caching to disk for long running applications, though I have no specific opposition to this since Manager already supports a disk cache by default.
- Validate the server responses to make sure it is something the client will accept.
The correct way to validate this is laid out in RFC 6960 3.2, which is how my local patches currently work.
- Refreshes the response, in the background, with sufficient time before expiration.
My proposal, at least in autocert.Manager, is to handle this as a sub-function of the renewal timer system.
- That said, even with background refreshing, such a system should observe the Lightweight OCSP Profile of RFC 5019.
Not something I had considered but I will work this into my client.
- As with any system doing background requests on a remote server, don't be a jerk and hammer the server when things are bad.
Yes, exponential back-off and retry with jitter is what I'd consider the best practice here. There's already prior art for this in autocert.Manager so just adapting that to work for our OCSP case should be pretty straightforward.
- Distributed or proxiable fetching
A valid point but I think this is an exercise for the user in the case of autocert.Manager integration. Building a distributed cache is rather application specific.
- The ability to serve old responses while fetching new responses.
That is, it shouldn't be mutually exclusive - it's not that there is the 'ONE TRUE RESPONSE' - some flexibility for multiple responses is needed.
What is the correct behavior here? RFC 6960 3.2 is not clear on what expiration means and what is the appropriate behavior in the case of expiration and makes weak recommendations such as "sufficiently recent" (where "sufficiently recent" is not specified).
Firefox defines "sufficiently recent" as within 24 hours of the time in the response.
I guess clients could use an expired OCSP response as a hint that the cert is valid but that strikes me as client-by-client behavior. I don't see any reason to staple a response more than 24-hours expired.
@rolandshoemaker WDYT?
- Some idea of what to do when "things go bad".
Knowing that something has gone wrong in stapling is the first step to handling the case "when things go bad", which is going to be pretty system and operator specific. Generally the way I handle this in systems is for the Manager equivalent to accept a chan error
that is read by a error reporting goroutine which generates metrics for a monitoring system. This isn't a common pattern within the standard library though so I'm interested in what a good solution would be here. My local patch set just discards errors.
- Configurable OCSP responder per-certificate-being-checked.
This may be necessary for a general purpose OCSP stapling solution, but I don't think it's all that useful for autocert.Manager integration since there's only one CA and it does embed the responder URL in the certificates.
- Staple by default.
This will be configurable in the autocert.Manager API.
Will write up more thoughts tomorrow, but see https://github.com/rolandshoemaker/stapled/blob/master/ocsp/ocsp.go for a partial implementation of a fetcher/verifier that takes into account the post written by Sleevi.
How important is OCSP stapling really, given the considerations in https://blog.apnic.net/2022/03/22/whats-going-on-with-certificate-revocation/ ?
The value of OCSP in general is debatable, but since both Firefox, and (basically) every browser on macOS and iOS, still do OCSP checking, stapling allows clients to avoid making at least one extra request, which in some cases may be considered valuable. Additionally there is a privacy win (although in my opinion somewhat minor) from stapling, since you no longer need to query (in plaintext) an OCSP server revealing the serial number of the certificate you are verifying.
OK, it sounds like people think we should do this, and it sounds like the API is just one extra bool in the Manager struct. Above, Roland wrote:
type Manager struct {
...
// EnableOCSPStapling enables OCSP stapling for each certificate it obtains, returning
// a fresh OCSP response from the GetCertificate method.
EnableOCSPStapling bool
}
I might suggest using 'StapleOCSP' instead, to match the title of this proposal and generally be more direct.
Otherwise, does anyone object to adding this?
Let's make the new field StapleOCSP bool. Sounds like people are in favor.
Sounds good to me.
Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group
OSCP stapling is considered a good practice to reduce load on OCSP servers and also to eliminate a round-trip to the CA when a user-agent verifies the certificate.
autocert.Manager
does not currently staple OCSP responses and it should. I propose thatautocert.Manager
be updated to staple the OCSP response when the certificate is generated and then treat OCSP expiration as a case of renewal in thedomainRenewal
logic.Stapling would likely need to happen in two different places:
autocert.Manager.cert
which is the cache fetch path forautocert.Manager.GetCertificate
. The code should load the certificate, check the expiration of the OCSP staple and in the absence of a staple or if the staple is expired it should attempt to fetch an OCSP response and staple it. On this path there should be a rather short deadline (perhaps 10 seconds) that, upon expiration, returns the certificate without an OCSP staple to prevent OCSP outages from unduly impacting users. The background renewal logic will eventually fix missing staples.autocert.Manager.authorizeCert
which is in the cache-miss path forautocert.Manager.GetCertificate
. This code should continue to fetch a certificate from the CA and upon success also make an OCSP request and staple that response to the certificate. This is also used byautocert.domainRenewal.do
which would then inherit the OCSP stapling logic.WIthin the domain renewal logic the follow updates would be needed:
autocert.domainRenewal.renew
performs the renewal of a certificate in the background of anautocert.Manager
. This would need to instead check the both the OCSP staple expiration and the certificate expiration and renew whichever one was about to expire.autocert.domainRenewal.next
performs the schedule calculation for the renewal loop that runs as a goroutine for each certificate. This would need to be updated to consider upcoming OCSP staple expiration as a reason to trigger a renewal run.It would also be possible to do OCSP stapling as a separate set of goroutines within
autocert.Manager
but that seems unnecessary given that the existing renewal logic seems easy to adapt to also handling this use-case.Additionally, if this proposal is accepted I will add support for fetching OCSP responses and stapling them to a certificate to x/crypto/ocsp so that the functionality can be shared outside of autocert.
If there is consensus that this feature would be accepted and the approach is valid then I will implement this and create a CL.