golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.1k stars 17.68k forks source link

x/crypto/acme/autocert: add Manager.StapleOCSP #51064

Open mcrute opened 2 years ago

mcrute commented 2 years ago

OSCP stapling is considered a good practice to reduce load on OCSP servers and also to eliminate a round-trip to the CA when a user-agent verifies the certificate. autocert.Manager does not currently staple OCSP responses and it should. I propose that autocert.Manager be updated to staple the OCSP response when the certificate is generated and then treat OCSP expiration as a case of renewal in the domainRenewal logic.

Stapling would likely need to happen in two different places:

WIthin the domain renewal logic the follow updates would be needed:

It would also be possible to do OCSP stapling as a separate set of goroutines within autocert.Manager but that seems unnecessary given that the existing renewal logic seems easy to adapt to also handling this use-case.

Additionally, if this proposal is accepted I will add support for fetching OCSP responses and stapling them to a certificate to x/crypto/ocsp so that the functionality can be shared outside of autocert.

If there is consensus that this feature would be accepted and the approach is valid then I will implement this and create a CL.

seankhliao commented 2 years ago

cc @golang/security

rsc commented 2 years ago

cc @rolandshoemaker

rsc commented 2 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rolandshoemaker commented 2 years ago

This has significant cross-over with #40017 (and #22274), which is on-hold and still doesn't have a fully scoped proposal. Since both implementations are likely to share a significant amount of functionality, I think it makes more sense to focus on the crypto/x509 functionality, with an eye to how it can be extended more generally to a TLS server (vs. a verifying client).

Ideally this should probably be mostly implemented at the crypto/tls level, with the only real change for autocert being adding a bool to Manager which sets something in the tls.Config that enables fetching/stapling.

mcrute commented 2 years ago

I don't see the overlap with #40017, as I read that it's primarily about client verification of OCSP and OCSP staples. This proposal is approaching it from the other side and would require a client to fetch an OCSP response but not verify it in the way a client would. I see some overlap in the solution space for #22274 and this issue though as both will require an OCSP client.

How would you feel about tackling these problems independently? In stage one autocert.Manager can learn a StapleOCSP flag which will do the stapling within autocert. This will necessitate the building of an OCSP client that can at least decode a certificate, fetch a response from the CA, and validate that the response is good before stapling to the cert. It would likely also require a background goroutine to watch the manager state and refresh staples as they're coming close to expiration (doing this out-of-band would have the least impact on end-users of a server using autocert in the case of a CA being slow or suffering an outage).

In stage two we can look into a solution for #22274 which would allow us to request the Must-Staple extension from the CA when we request certificates and then build out OCSP fetching within crypto/tls (using the client we built in stage one as the foundation for that fetching logic). In the conclusion of this phase we can drop our local implementation in favor of the crypto/tls implementation.

I expect phase two will take longer than phase one so this gets the functionality into the hands of autocert users quickly and gives us a test-case for OCSP stapling in the standard library that can inform our decisions about crypto/tls changes. Aside from the flag on autocert.Manager the implementation should be entirely invisible to users so we don't have to make any breaking API changes when we later migrate to a platform-level solution in crypto/tls.

What do you think?

rsc commented 2 years ago

Since https://github.com/golang/go/issues/40017 and https://github.com/golang/go/issues/22274 are on hold, and since @rolandshoemaker said we should wait to address those first, it sounds like we should put this on hold too. Does anyone object?

rolandshoemaker commented 2 years ago

I'm working on a plan to implement incrementally OCSP stapling, which will need to happen before there is a specific API change for x/crypto/acme/autocert, but I think in general the API proposal here is simply adding a bool to Manager which would enable the behavior, which I'm not opposed to.

rolandshoemaker commented 2 years ago

@mcrute Looking at the path to getting stapling functionality everywhere, I think this is a reasonable place to start. In general I think we just need a tls.Config.GetCertificate function, which the manager can create, which spawns a goroutine that keeps the response up to date.

Since we're going to want to provide this functionality more broadly down the road though I think this should be written with an eye to being relatively generic, such that it can later be moved into the standard library (in particular the OCSP client logic should probably be separate from the stapling management logic, so that it could feasibly be reused for fetching responses for other purposes).

mcrute commented 2 years ago

@rolandshoemaker based on my previous comment would it be worthwhile for me to create a PR that we can discuss more concretely for the manager work?

Agreed on the OCSP client point. Was thinking about introducing a Client struct to x/crypto/ocsp with a long term goal of that making it into the standard library to support your vision of stapling throughout the library. I've already written code to this end in my local workspace and could post that for review if you think it's worthwhile.

rsc commented 2 years ago

@rolandshoemaker It sounds like you have an idea about what API should be added to package autocert. What is that API precisely? Thanks.

rolandshoemaker commented 2 years ago

@rsc I think the following API diff for autocert would be reasonable

type Manager struct {
    ...

    // EnableOCSPStapling enables OCSP stapling for each certificate it obtains, returning
    // a fresh OCSP response from the GetCertificate method.
    EnableOCSPStapling bool
}

@mcrute that seems like a reasonable approach, if you'd like to send a CL tag me as a reviewer and we can work through what it should look like.

AGWA commented 2 years ago

I don't see the overlap with #40017, as I read that it's primarily about client verification of OCSP and OCSP staples. This proposal is approaching it from the other side and would require a client to fetch an OCSP response but not verify it in the way a client would.

Unfortunately, OCSP servers operated by publicly-trusted CAs exhibit a variety of pathological behaviors and are not guaranteed to return valid responses. Therefore, TLS servers need to verify OCSP responses before stapling them, to avoid causing verification errors in clients. https://gist.github.com/sleevi/5efe9ef98961ecfb4da8 is a good writeup that describes what a robust OCSP stapling implementation needs to do.

mcrute commented 2 years ago

@rolandshoemaker I'll start cleaning up my local patches and get them into a CL for discussion. Will probably take about a week because of some other obligations I need to attend to this week.

mcrute commented 2 years ago

@AGWA thanks for that link, I've excerpted it below because I think there are bits worth talking about in considering this design.

  1. Support for keeping a long-lived (disk) cache of OCSP responses.

In the case of autocert.Manager the staples would be stored in application memory but they could be cached to disk. I'm not sure I see the value of caching to disk for long running applications, though I have no specific opposition to this since Manager already supports a disk cache by default.

  1. Validate the server responses to make sure it is something the client will accept.

The correct way to validate this is laid out in RFC 6960 3.2, which is how my local patches currently work.

  1. Refreshes the response, in the background, with sufficient time before expiration.

My proposal, at least in autocert.Manager, is to handle this as a sub-function of the renewal timer system.

  1. That said, even with background refreshing, such a system should observe the Lightweight OCSP Profile of RFC 5019.

Not something I had considered but I will work this into my client.

  1. As with any system doing background requests on a remote server, don't be a jerk and hammer the server when things are bad.

Yes, exponential back-off and retry with jitter is what I'd consider the best practice here. There's already prior art for this in autocert.Manager so just adapting that to work for our OCSP case should be pretty straightforward.

  1. Distributed or proxiable fetching

A valid point but I think this is an exercise for the user in the case of autocert.Manager integration. Building a distributed cache is rather application specific.

  1. The ability to serve old responses while fetching new responses.

That is, it shouldn't be mutually exclusive - it's not that there is the 'ONE TRUE RESPONSE' - some flexibility for multiple responses is needed.

What is the correct behavior here? RFC 6960 3.2 is not clear on what expiration means and what is the appropriate behavior in the case of expiration and makes weak recommendations such as "sufficiently recent" (where "sufficiently recent" is not specified).

Firefox defines "sufficiently recent" as within 24 hours of the time in the response.

I guess clients could use an expired OCSP response as a hint that the cert is valid but that strikes me as client-by-client behavior. I don't see any reason to staple a response more than 24-hours expired.

@rolandshoemaker WDYT?

  1. Some idea of what to do when "things go bad".

Knowing that something has gone wrong in stapling is the first step to handling the case "when things go bad", which is going to be pretty system and operator specific. Generally the way I handle this in systems is for the Manager equivalent to accept a chan error that is read by a error reporting goroutine which generates metrics for a monitoring system. This isn't a common pattern within the standard library though so I'm interested in what a good solution would be here. My local patch set just discards errors.

  1. Configurable OCSP responder per-certificate-being-checked.

This may be necessary for a general purpose OCSP stapling solution, but I don't think it's all that useful for autocert.Manager integration since there's only one CA and it does embed the responder URL in the certificates.

  1. Staple by default.

This will be configurable in the autocert.Manager API.

rolandshoemaker commented 2 years ago

Will write up more thoughts tomorrow, but see https://github.com/rolandshoemaker/stapled/blob/master/ocsp/ocsp.go for a partial implementation of a fetcher/verifier that takes into account the post written by Sleevi.

rsc commented 2 years ago

How important is OCSP stapling really, given the considerations in https://blog.apnic.net/2022/03/22/whats-going-on-with-certificate-revocation/ ?

rolandshoemaker commented 2 years ago

The value of OCSP in general is debatable, but since both Firefox, and (basically) every browser on macOS and iOS, still do OCSP checking, stapling allows clients to avoid making at least one extra request, which in some cases may be considered valuable. Additionally there is a privacy win (although in my opinion somewhat minor) from stapling, since you no longer need to query (in plaintext) an OCSP server revealing the serial number of the certificate you are verifying.

rsc commented 2 years ago

OK, it sounds like people think we should do this, and it sounds like the API is just one extra bool in the Manager struct. Above, Roland wrote:

type Manager struct {
    ...

    // EnableOCSPStapling enables OCSP stapling for each certificate it obtains, returning
    // a fresh OCSP response from the GetCertificate method.
    EnableOCSPStapling bool
}

I might suggest using 'StapleOCSP' instead, to match the title of this proposal and generally be more direct.

Otherwise, does anyone object to adding this?

rsc commented 2 years ago

Let's make the new field StapleOCSP bool. Sounds like people are in favor.

rolandshoemaker commented 2 years ago

Sounds good to me.

rsc commented 2 years ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rsc commented 2 years ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group