apple / swift-nio-ssl

TLS Support for SwiftNIO, based on BoringSSL.
https://swiftpackageindex.com/apple/swift-nio-ssl/main/documentation/niossl
Apache License 2.0
383 stars 136 forks source link

OCSP support #242

Open kmahar opened 3 years ago

kmahar commented 3 years ago

Hi SwiftNIO folks!

One thing I have been thinking about as we plan out rewriting mongo-swift-driver's internals in Swift based on NIO is TLS and specifically OCSP support.

As of MongoDB 4.4, the database server enables OCSP by default and supports OCSP stapling as well as the OCSP must-staple extension. Therefore, TLS library permitting, our drivers should now enable OCSP by default (specification).

It's not critical for us to support this yet, but as described here MongoDB Atlas (our DBaaS) is moving to use LetsEncrypt for its certs, which only supports OCSP.

It doesn't look to me like this capability is exposed via NIOSSL, although (I think? based on a quick search through the vendored source) it looks like BoringSSL supports it. Is this something you'd consider exposing?

Lukasa commented 3 years ago

Howdy @kmahar! Thanks for the feature request.

I want to draw some delineations between some terms because I think we should clarify what exactly we’re talking about. That will let us guarantee we’re talking about the same things! I’ll add a few extra definitions for interested outsiders too. I think we want definitions for:

With that set of definitions, let me explain my position on most of these things. Firstly, I think there is no value in adding support for online revocation checking. Online revocation checking suffers from the question of what to do in the face of outages in the OCSP responder. As these outages are reasonably frequent, and as users hate for their services to fail because the CA’s systems are broken, OCSP responders tend to have to “fail open”. This isn’t great because it makes hiding a revocation straightforward: prevent the OCSP response getting to the client.

OCSP stapling is a different beast. Because it requires that the server fetch the OCSP response, and the server may cache it (usually for a number of days), outages are less critical. However, OCSP stapling without Must-Staple is of pretty minimal utility because, again, there is an easy workaround if the cert is revoked: just stop serving the OCSP response.

This means the minimum feature set we’d need to implement would be: parsing and validating OCSP responses, OCSP stapling support, and OCSP Must-Staple support, on the client. We could definitely do these things. BoringSSL has most of the crypto functionality we need, so we’d just have to glue it together and produce appropriate APIs.

A subsequent extension to the work would be to build an OCSP responder whose primary purpose is to support the server side of OCSP stapling. This, combined with APIs to set the stapled OCSP response, would allow NIO servers to support Must-Staple TLS certificates as well. This is also an acceptable thing to do.

As a note, OCSP stapling is also falling out of favour due to the many infrastructural problems associated with it. The industry is trending towards just standardising on short-lived certificates with good infrastructure for rolling them. For that reason I’m wary of us spending too much time doing OCSP work if we can avoid it.

Nonetheless, I think supporting the client side of OCSP stapling and must-staple is probably a good idea.

kmahar commented 3 years ago

Thanks for the speedy and thorough response, @Lukasa!

I'm only just catching up on what exactly OCSP is this week, so the clear definitions are very helpful.

Some questions -

  1. Regarding online certificate verification: I definitely see how assuming success in the face of an OCSP responder outage is problematic. However, I'm not sure if I follow why not implementing online certificate revocation checking at all is better than implementing it in a more strict manner where a lack of response is treated the same as an invalid cert. Would this be bad because it's inconsistent with what other libraries do? Or bad because users might then choose to use this unreliable mechanism, and get annoyed with how often it fails?

  2. Say a client with support for what you describe above receives a certificate that is not marked must-staple, but does have a valid stapled response. This would be accepted by the client, right? (I don't think there's a reason for a client to prefer a must-staple certificate vs a non-must-staple certificate so long as a valid response is stapled to it, but I may be wrong.)

Lukasa commented 3 years ago

Regarding online certificate verification: I definitely see how assuming success in the face of an OCSP responder outage is problematic. However, I'm not sure if I follow why not implementing online certificate revocation checking at all is better than implementing it in a more strict manner where a lack of response is treated the same as an invalid cert. Would this be bad because it's inconsistent with what other libraries do? Or bad because users might then choose to use this unreliable mechanism, and get annoyed with how often it fails?

The reason not to do it is because it's a lot of work for a strategy that doesn't really successfully defend the user. Users have to decide how they will configure this: will they allow an OCSP responder failure to prevent the TLS connection from completing? If they do, then the liveness of their system is now limited both by their own code and by the OCSP responder for every certificate in the server chain. If any of those OCSP servers is misbehaving, the handshake will fail. If they do not allow OCSP responder failure to prevent the connection, then they have no more security than they had before, but they have a slower TLS handshake.

Given that OCSP responder outages are reasonably common, this is a non-theoretical question. Note that if the OCSP responder is misbehaving, then no number of retries will fix the problem: the system is completely unavailable until the OCSP response returns.

OCSP works a bit better if you have system-wide caches which can hold on to older OCSP responses, including those originally fetched on behalf of other processes. We don't have much access to that kind of functionality on Linux, so adding on-by-default online OCSP validation just makes our systems fail more and take longer to perform TLS handshakes.

Say a client with support for what you describe above receives a certificate that is not marked must-staple, but does have a valid stapled response. This would be accepted by the client, right? (I don't think there's a reason for a client to prefer a must-staple certificate vs a non-must-staple certificate so long as a valid response is stapled to it, but I may be wrong.)

Sure, a stapled response would be accepted by the client even if the cert didn't have must-staple enabled. It's just not a high-value signal of validity because if the cert had been revoked a malicious server could simply choose not to send the OCSP response. That's why you need stapling and Must-Staple to get real value out of the system.

kmahar commented 3 years ago

Thanks very much for those clarifications! All you've said makes a lot of sense. I believe for our purposes in the driver what you propose supporting on the client side would be sufficient.

Lukasa commented 3 years ago

Ok, with further digging the shape of this has come into view. The TL;DR is that most of this is easy, and some of it is annoying.

The easy bits are finding the URL to request the stapled OCSP response from, to request stapled OCSP responses from the server, to attach stapled OCSP responses as the server, and to get the stapled response. That's all easy-peasy.

Unfortunately the hard parts are to do anything useful with the OCSP responses. BoringSSL has removed the OCSP_REQUEST/OCSP_RESPONSE structures as well as their associated ASN.1 parsing code, so we'll have to bring them back in some form. We probably don't want their full complexity, of course, but it's a bit sad to do that. For expediency reasons we'll probably do this by just calling the BoringSSL ASN.1 code, though we could potentially investigate the limited Swift ASN.1 code in Swift Crypto for completeness.

Lukasa commented 3 years ago

Note that we can keep our Darwin-based evaluator working with stapled OCSP responses using SecTrustSetOCSPResponse.