Mbed-TLS / mbedtls

An open source, portable, easy to use, readable and flexible TLS library, and reference implementation of the PSA Cryptography API. Releases are on a varying cadence, typically around 3 - 6 months between releases.
https://www.trustedfirmware.org/projects/mbed-tls/
Other
5.52k stars 2.6k forks source link

Improve documentation and behavior of x_size in mbedtls_dhm_make_params #4277

Open gilles-peskine-arm opened 3 years ago

gilles-peskine-arm commented 3 years ago

Problem statement

The DHM API allows callers to request a maximum size x_size for the Diffie-Hellman secret, expressed in bytes. This is misleading and dangerous. This was not tested until https://github.com/ARMmbed/mbedtls/pull/4276.

Goal

Improve the documentation of x_size and possibly reject values that are definitely insecure. See discussion below, especially https://github.com/Mbed-TLS/mbedtls/issues/4277#issuecomment-814015512.

Note that this may or may not be relevant to implement TEE_ATTR_DH_X_BITS in the GlobalPlatform TEE API, see https://github.com/Mbed-TLS/mbedtls/issues/3839.

Original goal (API break)

Concretely, mbedtls_dhm_make_params and mbedtls_dhm_make_public should not have an x_size parameter, and should always behave as if x_size was mbedtls_mpi_size(&ctx->P).

Migration path: to get N bytes from a Diffie-Hellman shared secret, pass the shared secret as input to a key derivation function, and request N bytes from that KDF. You should do this anyway to mitigate the fact that the shared secret is biased.

mpg commented 3 years ago

I think the goal of the x_size argument is to improve performance. To make things concrete, when using a 3072 you mostly get (depending on who you ask) 128-bit security. It might seem a bit wasteful to generate 3072 bits of secret key and more importantly do a modular exponentiation with a 3072-bit exponent if the same security level could be achieved with a shorter key. For example, in this table is looks like NIST says a 256-bit secret exponent is enough for DH with a 3072-bit prime. Since the complexity of modular exponentiation is linear in the bitsize of the exponent, that's a performance improvement by a factor 12.

Now full disclosure: I'm not sure if that only applies to DH parameters that were chosen so that G generate a sub-group with an order matching the desired size (eg, a 3072-bit prime such that G generates a sub-group of about 2^256 elements) or if it works the same for all primes (in particular, for the "safe primes" (that is, (p-1)/2 is also prime) that we use when generating DH params). Perhaps @yanesca can shed some light on this.

Now I fully agree that making this configurable is dangerous, as there's an obvious risk of people selecting a size that's too small. (For example, thinking "I want a 128-bit security level so I'll just use 16 bytes here, while actually they need to double that (due to the rho method I think).)

But I still wanted to point out that this also brought benefits. So perhaps we could find a way to keep the benefits while removing the danger? Would it make sense to select it automatically in our code?

gilles-peskine-arm commented 3 years ago

Would it make sense to select it automatically in our code?

I don't understand what you mean. What would be selected where? The only place where library code generates DHM keys is in TLS, and that uses the full key space.

gilles-peskine-arm commented 3 years ago

I'm dubious about the benefits: nowadays, if you want to improve performance, use elliptic curves. But if we're going to keep the feature, I think this should be through a separate function mbedtls_dhm_gen_short_key, keeping the TLS-focused API using the full size only.

mpg commented 3 years ago

Would it make sense to select it automatically in our code?

I don't understand what you mean. What would be selected where?

The value of x_size would not be passed as a parameter, but deduced from the size of P and some tables. For example, using the NIST table linked above:

p_bits = mbedtls_mpi_bitsize( &ctx->P );
x_bits = p_bits;
if( p_bits <= 15360 )
    x_bits = 512;
if( p_bits <= 7680 )
    x_bits = 384;
if( p_bits <= 3072 )
    x_bits = 256;
if( p_bits <= 2048 )
    x_bits = 224;
if( p_bits <= 1024 )
    x_bits = 160;
x_size = ( x_bits + 7 ) / 8;

(In a similar way that ecp_mul_comb() doesn't take a w (window size) parameter but deduces the optimal value based on the size of the curve.)

I'm dubious about the benefits: nowadays, if you want to improve performance, use elliptic curves.

If you control both sides, yes, obviously. If you're a server wanting to support a variety of clients, you might be stuck with FFDH and might still want to limit the impact on your performance.

But I agree that as far as 3.0 is concerned, we can just remove that parameter. And then consider later if we want to introduce a transparent optimization, perhaps based on user feedback: if someone comes saying "I upgraded my server to 3.0 and the load went though the roof", we'll know there's interest, otherwise we can just not worry about this.

yanesca commented 3 years ago

Now full disclosure: I'm not sure if that only applies to DH parameters that were chosen so that G generate a sub-group with an order matching the desired size (eg, a 3072-bit prime such that G generates a sub-group of about 2^256 elements) or if it works the same for all primes (in particular, for the "safe primes" (that is, (p-1)/2 is also prime) that we use when generating DH params). Perhaps @yanesca can shed some light on this.

The standard assumption is that the key is uniformly random over a cryptographically strong cyclic group. To achieve this, the key size needs to match the bits of the group size. This is also what NIST prescribes.

I can't recall any obvious attacks either, but I wouldn't want to advise our users to go against such a standard assumption.

Considering the above, x_size is technically a domain parameter: it is the bitlength (N in FIPS.186-4) of the subgroup size (q in N in FIPS.186-4). Also, depending on how we want to achieve the uniform distribution over the group, it might be useful to know q as well.

mpg commented 3 years ago

Thanks for weighing in, @yanesca ! You're making an excellent point that if we want to have something like this, it should be part of the domain parameters: the size of the subgroup generated by G. So, I withdraw my previous proposal of selecting x_size automatically: since it's part of the domain parameters, we can't guess at it.

In light of this, I think the current x_size parameter to mbedtls_dhm_make_params() and mbedtls_dhm_make_public() is wrong: instead of being a size in bytes, it should probably be the value of q, and instead of being passed to these functions it should be passed to mbedtls_dhm_set_group().

I mean, if we wanted to support that performance optimization (using smaller secret exponents when G generates a smaller subgroup), we'd extend mbedtls_dhm_set_group() with a q parameter, or add mbedtls_dhm_set_subgroup_order() or something similar (also, we'd probably want to add a check for subgroup membership in dhm_read_public()). But I think Gilles has a point that we probably don't want to spend too much effort on FFDH performance in 2021, so I think currently my vote is:

gilles-peskine-arm commented 3 years ago

Looking back, there's nothing intrinsically wrong with picking a smaller x_size if that's the size of the subgroup generated by G: going beyond the size of G cannot possibly improve security. For common groups, such as RFC 7919, the order of G is just one bit shorter than P. But for e.g. DSA parameters, the order of G is significantly smaller. I think it's ok to allow this use of the API, especially since NIST allows it. But we should document it better.

I would like to completely overhaul the DHM API to make it more traditional (generate private key, export public key, generate shared secret) instead of the current. We don't have time for that in 3.0. (We could do it later, by extending it to look like ECDH.) But given that the current API is somewhat deprecated in my eyes, I prefer to spend a minimum amount of effort on it in 3.0. The reason I raised this issue was because I thought the API was unsafe, but I now think that the unsafe use (picking a very small x_size) can be documented away (and perhaps enforced with a minimum x_size?).

So I've changed my mind. I don't want to change the function prototype anymore. I propose to improve the documentation and make the following behavior changes:

mpg commented 3 years ago

@gilles-peskine-arm

given that the current API is somewhat deprecated in my eyes, I prefer to spend a minimum amount of effort on it in 3.0.

I agree with that, but I'm not sure I agree about your proposal. IMO the main problem is that mbedtls_dhm_make_params() and mbedtls_dhm_make_public() are entirely the wrong place to select that size (and a secondary problem that specifying a number of bytes is a pretty wrong way to specify it, as it means we're likely not going to draw uniformly from the range of interest), so keeping that parameter here is likely to cause trouble if and when we come around to improving the API.

(Btw what are the plans regarding FFDH in PSA Crypto?)

gilles-peskine-arm commented 3 years ago

we're likely not going to draw uniformly from the range of interest

The skew is negligible if x_size is a few bytes larger than the subgroup size. So maybe recommend using subgroup size plus 8 rather than subgroup size?

(Btw what are the plans regarding FFDH in PSA Crypto?)

Currently PSA crypto only supports predefined groups, so key generation has all the information. The specification doesn't say anything about the range of values that key generation should cover, but key generation generically requires a parameter that is the key size in bits. For custom groups, the plan is that in addition to a key type, the key attributes can contain domain parameters whose content depends on the key type. When we get around to defining domain parameters for FFDH custom groups, we'll need to think about how to convey the subgroup order.

mpg commented 3 years ago

The skew is negligible if x_size is a few bytes larger than the subgroup size. So maybe recommend using subgroup size plus 8 rather than subgroup size?

Yes, that would take care of my objection about using a number of bytes rather than the value of q. However my point about those functions being the wrong place for this remains.

Thanks for the info regarding PSA Crypto API. The reason I was asking is mainly because I was wondering whether we could skip re-designing the mbedtls_ API and just (tell people to) switch to the PSA Crypto API. But unfortunately that can't be done until it's extended to support custom groups. Still, perhaps extending it would be a better use of our time than re-designing the mbedtls_dhm_ API.

yanesca commented 3 years ago

I have the same problem with the current API as @mpg , and would like to see it fixed. Of course on the long term we would like to move away from this API and I don't want to spend effort on this in vain either.

Still, if we follow our preliminary plans and will make lower level modules internal, we will need to redesign the DH API in the medium term.

Therefore I would suggest that for now we spend as little time on it as possible and redesign it when we are making the lower level APIs internal.

gilles-peskine-arm commented 2 years ago

Given that this has missed the boat for 3.0, and we don't want to spend much effort on a low-level crypto API which should be subsumed by PSA in 4.0, and that we don't want to spend much effort on finite-field Diffie-Hellman these days, I've rescoped this issue to be a backward-compatible improvement only.