eclipse-hono / hono

Eclipse Hono™ Project
https://eclipse.dev/hono
Eclipse Public License 2.0
452 stars 137 forks source link

Enable client certificate revocation check with OCSP #3588

Closed kyberpunk closed 7 months ago

kyberpunk commented 10 months ago

Hi all, enterprise grade device PKIs should provide certificate revocation as part of device certificate lifecycle (e.g. for incidents or device decommissioning or rekeying). All services which trust CAs from such PKI must implement certificate revocation check with OCSP or CRL. Ability to disable such device by enabled flag is not sufficient because you cannot properly offload the device lifecycle management to PKI and must involve manual step in processes which can lead to mistakes. If you are not the owner of such PKI it is not easy get actively notified about revoked devices anyway.

Easiest way is to implement OCSP protocol, since you don't need to distribute CRLs. I've tried to implement revocation check in this PoC: https://github.com/kyberpunk/hono/pull/1. It uses underlaying Java Security to handle all the checks.

In order to make it working I did following:

I would like to create PR for revocation support, but would like to know your opinion first. Following questions come to my mind:

I would rather start with simple scope and iteratively improve it. Thank you for your feedback.

sophokles73 commented 10 months ago

Thank you for your interest in Hono :-) Security, of course, is always a concern and we appreciate any improvements in this area.

Ability to disable such device by enabled flag is not sufficient because you cannot properly offload the device lifecycle management to PKI and must involve manual step in processes which can lead to mistakes. If you are not the owner of such PKI it is not easy get actively notified about revoked devices anyway.

IMHO it will depend on the use case at hand whether this is sufficient or not. However, being able to leverage revocation information provided by certificate authorities seems to be valuable in general. In deciding how we could do that, we should also consider the concerns that are usually raised when it comes to OCSP.

FMPOV we need to make sure that

I just took a quick glance at the OCSP functionality of the JRE but I wonder if all of the above can be done using the standard functionality?

kyberpunk commented 10 months ago

Hi @sophokles73 , thank you for such fast response. I've analyzed more JDK revocation check implementation:

  • we perform revocation checks on the server side, i.e. we do not require devices to do such checks.

Agree, my intention was to implement server-side revocation check of device certificate on Hono adapters side. OCSP stapling on device side would not make it more efficient. On the other side, I would let a check of the server certificate revocation on device side fully on client implementation. I think there is plenty of options depending on TLS client which is used.

  • we do not (need to) make (extra) remote API calls to the OCSP provider on each device certificate check.

Unfortunately JDK implementation doesn't implicitly implement caching of OCSP responses and to implement it in the code could be complicated. I think this is not problem for long living connections like MQTT or HTTP with keepalive.

For frequent connections this may be issue, but for such cases the complementary CRL approach should solve it. I would let user choose appropriate check mechanism based on his use case. Regarding CRL it would depend if user should handle updating of CRL by himself or not.

  • we keep certificate revocation information sufficiently up-to-date.

This should be ensured by OCSP protocol by design, if the request is sent on each handshake. Of course, nonce extension should be enabled to avoid replay attacks.

  • we make all of this configurable and backwards compatible.

Sure, new settings should be optional and disabled by default. Also should be possible to extend/improve it to the future.

I'm thinking how to continue with this. Would be good to agree on most important configuration options and corresponding API. I may come with some proposal.

sophokles73 commented 10 months ago

OCSP stapling on device side would not make it more efficient. On the other side, I would let a check of the server certificate revocation on device side fully on client implementation. I think there is plenty of options depending on TLS client which is used.

I am pretty sure that almost no device in the field will support OCSP stapling. So FMPOV we should not bother about it ...

sophokles73 commented 10 months ago

Unfortunately JDK implementation doesn't implicitly implement caching of OCSP responses and to implement it in the code could be complicated. I think this is not problem for long living connections like MQTT or HTTP with keepalive.

This will be true for AMQP 1.0 and MQTT, but not for HTTP where we perform the device cert check on each invocation (for reasons related to the way vert.x/Netty uses the event loop). For CoAP the situation might also be different because we are not using the JRE's DTLS implementation but Eclipse Californium's. I am not 100% sure if Scandium (Cf's DTLS stack) uses the JRE's functionality for validating certificates. We'll need to check.

sophokles73 commented 10 months ago

For frequent connections this may be issue, but for such cases the complementary CRL approach should solve it. I would let user choose appropriate check mechanism based on his use case. Regarding CRL it would depend if user should handle updating of CRL by himself or not.

I do not think that we should support CRL for now. If at all, we would need to do the CRL refresh (asynchronously) in the background based on user supplied URL(s).

sophokles73 commented 10 months ago

I'm thinking how to continue with this. Would be good to agree on most important configuration options and corresponding API. I may come with some proposal.

Is there a particular use case that you need to support? In what context are you using Hono? Maybe it makes sense to start with an (experimental) implementation for the adapter that you require first and then see how we fare.

kyberpunk commented 10 months ago

I am not 100% sure if Scandium (Cf's DTLS stack) uses the JRE's functionality for validating certificates. We'll need to check.

It seems it is configured to use same verification implementation as other adapters: Application.java#L73 DeviceRegistryBasedCertificateVerifier.java#L100

I do not think that we should support CRL for now. If at all, we would need to do the CRL refresh (asynchronously) in the background based on user supplied URL(s).

Sure, Implementing the fetching of CRLs could be tricky. May also consider simple variant, that user must supply CRL file explicitly while operating Hono adapters (downloaded e.g. by cron job). This is how common software with TLS support like nginx, mosquitto or openvpn works.

Is there a particular use case that you need to support?

We are connecting devices via MQTT adapter to Hono. Currently using single tenant with single CA. Each device enrolls the authentication certificates from enterprise PKI. We would like to setup OCSP verification so we can revoke certificates on one place. OCSP URL is provided in certificate AIA extension. Checking just leaf certificates for revocation is enough for us.

So described PoC PR basically does what we need now. However, once we add another tenant or trusted CA in future, we may need to configure different OCSP properties per tenant or trusted CA (e.g. OCSP endpoint URL). Java security properties force the same configuration for all TLS connections, so at least this I could improve.

kyberpunk commented 9 months ago

Hello @sophokles73 and happy New Year. Did you please check the PR?

sophokles73 commented 8 months ago

Hi @kyberpunk, I have given this some more thought and I have to admit that the more I think about it, the less I like the idea of implementing the OSCP checks within Hono.

The most prominent way to use client certificate based authentication seems to be based on intermediate certificates derived from a root certificate that has been either self signed manually or has been created via PKI that is under the control of the Hono operator's organization. This means that the revocation of an intermediate certificate will most likely be triggered by the PKI owner. Based on this assumption it seems much easier to me to simply use Hono's Registry Management API to disable the trust anchor in such a case instead of implementing complicated OCSP checks which suffer from all the known disadvantages.

Alternatively, I can also imagine implementing functionality which periodically performs validity checks on the trust anchors contained in the registry using an existing OCSP responder configured by the Hono operator.

Adding the OCSP check to the connection establishment process feels inefficient at best and might have serious impact on connection rates when many devices are connecting at once, e.g. after a pod crashes and all devices want to re-establish their failed connections.

Does this make any sense to you?

kyberpunk commented 8 months ago

Hi @sophokles73 , our case is using enterprise PKI managed by security team which issues and manages all device certificates to multiple remote systems where the device is connecting to. If there is any incident all device certificates are revoked at once. Then you can only ensure that device cannot connect to any system by mistake. This is common way in big companies, you can imagine it similar as having AD for managing access of users.

I was thinking about the periodic job before, but registry currently does not provide certificate serial number (which we could implement of course). It would be similar as using Defender with AWS IoT Core. Bigger issue I see with auto-provisioning, that you cannot avoid connecting new already revoked device.

What would you think then about using CRL instead of OCSP? My proposal would be to have just option to enable revocation check in tenant CA and have configuration property to set path to CRL store. Or we can store it in Mongo, but it could be quite big. Then it is up to Hono operator how he updates the CRL files, can be just simple CRON job. This will mitigate the issue with connecting all devices at once. Would you be ok with such approach?

sophokles73 commented 8 months ago

I am not sure if I understand the intention correctly. When something suspicious happens,

  1. you revoke the intermediate certificate that had been used as a trust anchor. The intention here is that no device with a client cert that has been derived from the trust anchor can connect anymore using their certificate. Note that this would also disable auto-provisioning based on client certificates in scope of the trust anchor.
  2. you revoke the client certificates of affected devices individually. The intention here is that only those devices can no longer connect whose client certificate had been revoked.

In the first case, it would be very easy to just simply disable the trust anchor. In the second case, the x509 credentials of all affected devices would need to be disabled individually.

Which of these options are you using?

kyberpunk commented 8 months ago

I mean only revoking individual certificates issued by trust anchor. Devices are installed in untrusted area connecting to multiple systems in parallel (Hono is one of them). By suspicion I mean incident management (physical or remote intrusion detection) or simple decommissioning. This can be done by entities without access to Hono. There is hard requirement for enterprise PKI that all remote systems must check certificate revocation.

I don't use GlobalSign specifically, but this is describing the mechanism I mean for main concept https://www.globalsign.com/en/internet-of-things/iot-chip-cloud-integration-blueprint and revocation : https://www.globalsign.com/en/internet-of-things/iot-certificate-revocation.

kyberpunk commented 8 months ago

So are you ok with proposed way of using CRLs instead of OCSP? Otherwise we would need to use extra service which can monitor validity of certificates and disable devices in case. But there will be still gap during auto provisioning. However CRL or OCSP are standardized protocol which most of mTLS based services used and it may be the responsibility of operator to decide which works well for their use case. In our case OCSP or CRLs both will work.

sophokles73 commented 8 months ago

My proposal would be to have just option to enable revocation check in tenant CA and have configuration property to set path to CRL store. Or we can store it in Mongo, but it could be quite big. Then it is up to Hono operator how he updates the CRL files, can be just simple CRON job. This will mitigate the issue with connecting all devices at once.

I am not quite sure if I understand the approach. Can you elaborate a little more on which component would perform which check/action?

kyberpunk commented 8 months ago

Revocation will be performed by DeviceCertificateValidator similarly as original OCSP PR. It means that it will be checked by TLS adapter components (MQTT, HTTP, CoAP). There will just be option "crl-revocation-enabled" on CA settings of tenant in device registry to enable revocation check (or may we just enable it for all tenants for simplicity). CRL files will be supplied via filesystem to adapter components and file path will be configured by application properties (or env var.). Hono operator than has to update the file regularly e.g. by cron job (which we can do easily also in K8s).

kyberpunk commented 8 months ago

Hi @sophokles73 is this alternative approach acceptable instead of OCSP?

sophokles73 commented 8 months ago

I guess the efficiency of the CRL based validation would heavily depend on the way that the CRL check is implemented (in the JDK). Scanning the same CRL file line-by-line, over and over again for each device connecting, would probably not be much better than doing the OCSP check calling out to a responder. This is particularly true if it is done in a blocking way.

In a perfect world, I would like the revocation check to work for all adapters under all usage scenarios supported by Hono. IMHO this would ideally mean that the revocation check results are being cached and are not performed for each and every connection establishment.

My understanding of your use case is that you need to be able to prevent a particular device to connect to Hono. Based on that we should probably limit the initial scope to performing the revocation check for the device's client (end) certificate only, instead of performing checks on the whole certificate chain. In particular we would not check the trust anchor itself.

Would that work for you? If so, we should be able to accept your OCSP based code with the corresponding simplifications. In particular, I guess that we would not need to store additional information with the trust anchors in the Registry, or would we? All of the required information for performing the check should be available from the client certificate presented by the device, or am I mistaken?

kyberpunk commented 8 months ago

@sophokles73 I'm ok with checking end entity only, I can remove that option. For simplicity i would also omit caching feature for first increment since it is not supported by JDK implementation out of the box and would require additional business logic around it.

Unfortunately client certificate doesn't hold all information - OCSP requires issuer subject DN and issuer public key for the request. Both information are taken by Java implementation internally from trust anchor. What I need to store additionally to device registry is only issuer subject DN in original ASN.1 encoding (link). Because it is not possible to deterministically reconstruct it from subject as string (as explained in PR comments).

Are you ok with other OCSP configuration options?

sophokles73 commented 8 months ago

Unfortunately client certificate doesn't hold all information - OCSP requires issuer subject DN and issuer public key for the request. What I need to store additionally to device registry is only issuer subject DN in original ASN.1 encoding (link). Because it is not possible to deterministically reconstruct it from subject as string (as explained in PR comments).

Well, the adapter uses the client cert's Issuer DN (retrieved using X509Certificate.getIssuerX500Principal()) to look up the tenant that the device belongs to. Why don't you simply use X500Principal.getEncoded() to get its ASN.1 encoding? You seem to be using the same approach to retrieve the Subject DN from the CA cert in the registry ...

sophokles73 commented 8 months ago

Are you ok with other OCSP configuration options?

Do we need them? So far, my understanding is that we only need ocsp-revocation-enabled for your requirements, or am I mistaken?

kyberpunk commented 8 months ago

Well, the adapter uses the client cert's Issuer DN (retrieved using X509Certificate.getIssuerX500Principal()) to look up the tenant that the device belongs to. Why don't you simply use X500Principal.getEncoded() to get its ASN.1 encoding? You seem to be using the same approach to retrieve the Subject DN from the CA cert in the registry ...

Because originally TrustAnchor and related X500Principal is constructed from subject DN string persisted in device registry. In such case it is always interpreted as UTF8 ASN string. But certificates are usually generated (e.g. by OpenSSL) using different data type like PrintableString, which are also valid. But OCSP request contains hash of ASN.1 encoded form which doesn't match the hash or real certificate subject, because of different data type. OCSP responder then didn't accept the request. With data from device registry it wasn't possible to determine the original data type of subject DN because it is stored as serialized string value.

kyberpunk commented 8 months ago

Are you ok with other OCSP configuration options?

Do we need them? So far, my understanding is that we only need ocsp-revocation-enabled for your requirements, or am I mistaken?

Basically yes, ocsp-nonce-enabled can be enforced by default, since it is essential to avoid replay attacks. But ocsp-responder-uri may be useful if URL changes for any reason or certificates don't contain AIA extension. And ocsp-responder-cert is needed because Java implementation requires full X509Certificate for OCSP response verification; It is not possible to use just subject DN and public key in OpenJDK implementation.

sophokles73 commented 8 months ago

I was talking about determining the trust anchor's ASN.1 Subject DN by means of extracting it from the client certificate:

final X509Certificate clientCert = ...;
final byte[] asn1IssuerDN= clientCert.getIssuerX500Principal().getEncoded();

Shouldn't that be exactly what you need to create the hash and use it for asking the responder?

kyberpunk commented 8 months ago

I was talking about determining the trust anchor's ASN.1 Subject DN by means of extracting it from the client certificate:

final X509Certificate clientCert = ...;
final byte[] asn1IssuerDN= clientCert.getIssuerX500Principal().getEncoded();

Shouldn't that be exactly what you need to create the hash and use it for asking the responder?

In general you are right, but internal JDK implementation takes it always from trust anchor which is configured for whole CertPathValidator instance. There is no way how to configure it for OCSP functionality separately. So we would need to construct TrustAnchor using the issuer subject DN from client certificate and replace the TrustAnchor matching the client certificate on some other place than in TenantObject. E.g. in DeviceCertificateValidator itself. Would it be good idea? It doesn't sound much more clean to me. Or different OCSP implementation would be needed and called after the JDK verification.

sophokles73 commented 8 months ago

Ok, I see your point. I guess we should then stick with your current implementation that stores the encoded CA' Suject DN in the registry.