Remove the RSA key mutex

gilles-peskine-arm commented 3 years ago

RSA key objects include a mutex. mbedtls_rsa_private locks the mutex because it caches some auxiliary values used for blinding in the key object. (mbedtls_rsa_public also locks the mutex but it seems pointless.) This allows applications to create a key (this must be done in a single-threaded way), then use that key concurrently.

Unfortunately, the presence of the mutex complicates the lifecycle of RSA contexts. For example, part of the library assumes that mbedtls_rsa_free can be called twice on the same object, but as I write, this is not the case when MBEDTLS_THREADING_C is enabled and the platform does not allow calling mbedtls_mutex_free on an already-freed mutex. (There's a fix in https://github.com/ARMmbed/mbedtls/pull/4104, but it's a kludge and might not work in all sensible cases.)

From a high-level architectural perspective, the RSA module is a low-level part of the code dedicated to peforming calculations. Managing concurrency is outside its scope. A better place to protect against concurrent access would be in the PSA dispatch code, where it is mandated by the PSA specification (as is concurrent key management) but not yet implemented in Mbed TLS.

The API is not well-documented, leading to unmet expectations.

ECC contexts do not have a mutex, even though they would need one: some values are cached in the mbedtls_ecp_group object to accelerate multiplication. So a multithreaded application that works with RSA keys can't easily be changed to ECC keys: without extra precautions, it would be prone to race conditions.

Since mutexes don't really belong, complicate maintenance, and their presence is potentially misleading, I propose to remove mutexes from RSA keys in Mbed TLS 3.0.

Mailing list discussion: https://lists.trustedfirmware.org/pipermail/mbed-tls/2021-February/000294.html

mpg commented 3 years ago

ECC contexts do not have a mutex, even though they would need one: some values are cached in the mbedtls_ecp_group object to accelerate multiplication. So a multithreaded application that works with RSA keys can't easily be changed to ECC keys: without extra precautions, it would be prone to race conditions.

Note: the last sentence is correct if the application is not using the PK layer, but if it is, then switching to ECC keys should be safe, as the PK wrapper includes a hack that makes things thread-safe (at the cost of throwing away the cache, making MBEDTLS_ECP_FIXED_POINT_OPTIM not a time-memory trade-off but a pure waste of memory in that usage pattern - see #4128 and #4127).

Anyway, I'm in favour of removing this mutex but I have a few questions:

why only RSA? Are we happy with the mutexes in the two DRBG modules and the entropy module? (I checked and that seems to be the only other mutexes on crypto stuff.) In terms of design and API consistency, it might be easier to have a simple rule that "mbedtls_xxx() crypto APIs are not thread-safe, PSA crypto APIs are".
how hard would it be to implement thread-safety for RSA crypto operations (encrypt/decrypt, sign/verify, as opposed to key management) in our PSA Crypto implementation? I feel like it would be a bit nicer on users to provide a thread-safe alternative at the same time as we remove this one, but OTOH we don't want to overload 3.0 (and we could do in in 3.1 too).

gilles-peskine-arm commented 3 years ago

Thanks, I wasn't aware of the pk wrapper hack. (It could use some documentation!)

I'm happy with keeping the mutexes in the DRBG and entropy modules because they have a reasonably clear usage (single-threaded set up then thread-safe random generation) and they address a very clear need: an application with a single RNG instance that's accessed from all threads.

Our bug tracking record shows that we got complaints about the RNG mutexes when they broke. We got complaints about RSA/ECP concurrent usage because what was there was inconsistent.

I think Making PSA crypto thread-safe is a few weeks' work. Not doable in the 3.0 time frame. And there's no point in starting optimistically, because it's useless until it's finished and likely to bitrot quickly.

If we want to keep thread safety for key usage in the classic API, I think we should remove the mutex from RSA and put one in PK contexts.

mpg commented 3 years ago

Our bug tracking record shows that we got complaints about the RNG mutexes when they broke. We got complaints about RSA/ECP concurrent usage because what was there was inconsistent.

Good point, then I think it makes sense indeed to remove the RSA mutex but keep the RNG ones.

I think Making PSA crypto thread-safe is a few weeks' work. Not doable in the 3.0 time frame. And there's no point in starting optimistically, because it's useless until it's finished and likely to bitrot quickly.

OK, and agreed that there's no point in starting if we can't complete in time.

If we want to keep thread safety for key usage in the classic API, I think we should remove the mutex from RSA and put one in PK contexts.

Perhaps we should wait to see what the reactions are on the mailing-list: this could be a compromise if it turns out people miss the mutex. Otherwise, let's not do it (and we could always add it later if we find out later that there's interest.)

mpg commented 3 years ago

Thinking a bit more about this: this would break programs such as our own ssl_pthread_server when using RSA keys. What's worse, it would break them in a pretty bad way:

concurrency issues tend to a hard to debug and are easy to miss during testing;
since this affects a counter-measure, it seems rather hard to guarantee that affected code would not suddenly become more vulnerable to side channels;
turning working code into code that fails cleanly and loudly is one thing that is acceptable when bumping the major version number, but silently turning working and secure code into code what may either fail randomly or even not appear to fail, but fail to be secure, is another think entirely.

On another front, what's the migration path? What alternative are people supposed to use when upgrading? PSA is not thread-safe either, so people need to code up their own protection, and I think their only option to then integrate that into our SSL/TLS layer is to use the RSA_ALT type in the PK layer. Doable, but not extremely friendly.

gilles-peskine-arm commented 3 years ago

what's the migration path?

Ok, that's a good point: our only migration path is “deal with it”, and the fact that it's hard to deal with it if you're using the key for a TLS connection shows that this is not a reasonable path.

If we had the time, I'd say to move the mutex to pk. But I fear that we don't. I propose to leave this in the “3.0 optional” category for now.

gilles-peskine-arm commented 5 months ago

In Mbed TLS 4.0 or more precisely TF-PSA-Crypto 1.0, rsa.h will become private, so removing the mutex will not be an API break anymore. Thus I'm moving this issue to the tech debt backlog. We'll do it for 4.0 if it's convenient to remove the mutex, and otherwise we can do it later.

yanesca commented 5 months ago

There is now threading support in PSA Crypto and once https://github.com/Mbed-TLS/mbedtls/issues/8147 is done these mutexes will become truly redundant and no migration path will be needed.

Mbed-TLS / mbedtls

Remove the RSA key mutex #4124