WebAssembly / wasi-crypto

WASI Cryptography API Proposal
162 stars 25 forks source link

Compressed Points #55

Closed npmccallum closed 2 years ago

npmccallum commented 2 years ago

The specification as currently stated supports three "compressed" variants:

These seem redundant to their uncompressed variants. All three are expressed using SEC-1 compressed format. Further, SEC-1 compressed format is always introspectable (you'll always know if it is compressed or not based on the first byte). I don't see any reason for these to exist.

(There was some historical concern about patents. But these patents have since expired.)

tarcieri commented 2 years ago

If the suggestion is to eliminate uncompressed points in favor of always using compressed points, then +1.

Notably uncompressed points have a long history of deficient implementations being vulnerable to invalid curve attacks, and exist only because of point compression patents which expired over a decade ago (which were silly to begin with).

This is a sharp edge it would be nice to eliminate in new standards.

jedisct1 commented 2 years ago

Hi!

When deserializing, whether the coordinates are compressed or not is something that indeed doesn't need to be explicit.

However, these constants are also used for serializing existing points. Imposing an arbitrary choice to applications would not be ideal. This is not an opinionated crypto library. Compressed coordinates are smaller, and generally a good choice now that patents are not a concern any more. But PGP, TLS 1.3 and CPACE are examples requiring full coordinates.

Now, for deserialization, these should probably be equivalent. This is what the implementations currently do, but the specification should explicitly mention this.

npmccallum commented 2 years ago

I agree with @tarcieri that always defaulting to compressed points would be very desirable. However, it is also a practical matter that a number of protocols do require uncompressed format. Further, although both SEC-1 encodings are fully isomorphic, converting from uncompressed to compressed is trivial (can be performed with simple byte manipulation) while the inverse is not. Therefore, since WASI does not have the cloud to change existing cryptographic standards, I think the most practical option is to always encode uncompressed.

I propose the following:

  1. Remove the compressed variants from keypair_encoding.
  2. Remove the compressed variants from publickey_encoding.
  3. Detect point compression and validate group membership on keypair_import() and publickey_import().
  4. Always encode uncompressed on keypair_export() and publickey_export().
tarcieri commented 2 years ago

Notably also, if implementations of legacy protocols that do need to deal with uncompressed points handle them at the library level, rather than outsourcing it to the backend implementation, they can implement the following scheme which is always guaranteed to avoid invalid curve attacks:

  1. Convert uncompressed point to compressed point
  2. Decompress the compressed point
  3. Check that the uncompressed y-coordinate matches what was originally provided

Even if the third step is skipped and the provided y-coordinate is incorrect/invalid, this will still avoid invalid curve attacks

jedisct1 commented 2 years ago

That would indeed simplify things from an API perspective.

The downside is that it requires bindings/library authors to manually split the uncompressed representation and encode the sign in order to get the compressed representation. Not a lot of effort, but it doesn't feel like the best abstraction layer to perform this kind of machinery.

jedisct1 commented 2 years ago

How about:

?

Do some PQ schemes also support compressed/decompressed representations for keys?

jedisct1 commented 2 years ago

The curve equation should be checked when using full coordinates, but that would be done by the underlying crypto libraries, not at the WASI API level.

We can document it, but I feel like that would add confusion, or force people implementing this API on top of existing libraries to add redundant and potentially dangerous code.

What we really need is a test suite, in order to ensure compatibility between different implementations of the API. For that particular case, it can include checks against the Project Wycheproof test vectors.

npmccallum commented 2 years ago

@jedisct1 I don't think conversion from uncompressed to compressed is that big of a wart. I think extra parameters that are tied to specific serializations is, comparatively, a much bigger wart.

This conversion from uncompressed to compressed only needs to take place in two circumstances:

  1. A protocol requires compressed format. I'm unaware of any. And given the patent history, it is unlikely that there are any.

  2. A particular application has significant network performance concerns and needs therefore needs to save a few bytes. Like above, I'm unaware of any. Further, it is only 32 bytes for P-256 (the most common curve). In a certificate of an average size of 2K, we're talking about 1.5% savings.

In all other cases, the uncompressed point format will work without any modifications.

npmccallum commented 2 years ago

My point above is that conversion from uncompressed to compressed format is extremely rare but also easy to do if needed.

tarcieri commented 2 years ago

that would be done by the underlying crypto libraries

...which needs to be implemented in n different backends, some of which will inevitably screw it up.

Based on experiences with libraries implementing various standards within the JOSE ecosystem like JWE, this inevitably happens in even with modern cryptographic implementations, even if you warn implementers not to screw it up.

Give implementers a sharp edge and they will inevitably cut themselves.

We can document it, but I feel like that would add confusion, or force people implementing this API on top of existing libraries to add redundant and potentially dangerous code

I think the only true danger here is invalid curve attacks which can only exist in deficient implementations mishandling and failing to validate whether the coordinates are actually a valid solution to the curve equation.

Can you point to a vulnerability you foresee in a compressed-only API which is remotely as severe as an invalid curve attack?

The only one I can think of is failure to validate the y-coordinate is correct, which only means an invalid point which should've been rejected was accepted, but this is just a minor unexploitable vulnerability AFAICT.

npmccallum commented 2 years ago

@tarcieri It isn't clear to me what you're proposing.

This API provides two functions that take public keys as inputs and two that provide public keys as outputs. A compressed-only approach only has benefits on inputs, not outputs. We could specify that inputs always take compressed form and outputs always produce uncompressed form. This would protect the two functions that take public keys as inputs.

However, it would also impose a pretty high bar for implementations to parse all incoming public keys and convert them to compressed format. That doesn't strike me as a practical choice.

tarcieri commented 2 years ago

Now I'm confused, because I thought we were proposing the same thing... well, aside from this:

We could specify that inputs always take compressed form and outputs always produce uncompressed form.

I would suggest inputs are always compressed, and outputs are compressed-by-default with the option to produce an uncompressed output.

Legacy use cases that want to deal with uncompressed inputs can convert the uncompressed point to a compressed point, decompress it, and ensure it round-trips back to the original uncompressed input.

npmccallum commented 2 years ago

@tarcieri I am proposing that all outputs be in uncompressed format and that inputs support both formats.

This is not because I disagree with you regarding this being a "sharp edge" for security. In fact, I agree with you.

However, I think that we have to choose between "sharp edges"; one for security (uncompressed) and one for usability (compressed). I make this trade-off based on the following assumptions:

  1. 99% of the cases will want uncompressed format. This is due almost exclusively to TLS 1.3.

  2. The ratio of runtime crypto implementors to runtime users is very low. Following the utilitarian principle, although the pain suffered by the crypto implementation failures is high but rare, there are far more users of the API. Asking them to jump through hoops is less pain, but there is more of them and the pain is more frequent.

tarcieri commented 2 years ago

Oh I see. I misunderstood you then that you’re advocating for uncompressed points over compressed points.

In that case, I would suggest retaining the current state of affairs and supporting both.

If you’re getting rid of one of the two, it really should be uncompressed. It’s a legacy workaround for a patent that expired over a decade ago. Making the API only support uncompressed points is only perpetuating these legacy concerns at the cost of security.

There are plenty of applications of compressed points beyond TLS. Bearer credentials that use key wrapping, age, and JWE are some examples.

jedisct1 commented 2 years ago

For deserialization, we should support both, i.e. seamlessly support existing applications and protocols.

For serialization, my guess is that code using these hostcalls will want to support both, because they already do. I expect most of these libraries and applications to be existing ones, using e.g. OpenSSL or CryptoAPI and willing to add a shim to add WASI compatibility.

Right now, they don't have to care or know anything about how SEC1 is formatted, or even about what a public or private key represents. They just use an API to serialize/deserialize keys and view the output as an opaque blob of bytes. Asking these implementers to understand inner details and do additional work is not nice, and not great for adoption.

Looking beyond EC points, are there PQ schemes where uncompressed and compressed representations are possible and where the tradeoffs would be more significant than computing a square root vs checking a curve equation?

tarcieri commented 2 years ago

The distinction exists in ECC due to point compression patents. I’m not aware of any such concerns with the PQcrypto candidates, where the main patents of concern are the expired NTRU patents and the CNRS patents where the consensus seems to be they’re non-applicable (e.g. to Kyber/Saber)

jedisct1 commented 2 years ago

Also, if the serialized format must be the full coordinates, how do we handle X25519 keys, where only one coordinate is necessary for scalar multiplication?

We can compute the Y coordinate on serialization, and ignore it on deserialization, but that doesn't sound optimal.

npmccallum commented 2 years ago

@tarcieri If you were writing a network protocol, I'd agree 100%. The problem is that we're writing a crypto API that has to be used by legacy protocols.

npmccallum commented 2 years ago

if the serialized format must be the full coordinates, how do we handle X25519 keys?

There was never any historical usage of full coordinates for these curves. So no change to the current behavior.

jedisct1 commented 2 years ago

if the serialized format must be the full coordinates, how do we handle X25519 keys?

There was never any historical usage of full coordinates for these curves. So no change to the current behavior.

There is: GPG.

npmccallum commented 2 years ago

@jedisct1 Then I would still fall back on always writing both x and y on output, since the conversion in the opposite direction is trivial.

jedisct1 commented 2 years ago

Right. For X25519/X448, we can do what the excellent Cryptography library does. Output both coordinates, as you said, and if people want the common, compressed representation, this is what the Raw serialization type is for.

jedisct1 commented 2 years ago

Here's a proposal:

What do you think?

jedisct1 commented 2 years ago

/cc @sonder-joker

tarcieri commented 2 years ago

For serialization, for each key type, we document what the output format is. The entire column can say Uncompressed for now, but that gives us some headroom to change this for future algorithms in a backward-compatible way

This approach seems good to me. For what it's worth the elliptic-curve crate has a PointCompression trait which signals a curve-specific "preference" for whether points should be compressed or uncompressed.

This is used by the From<PublicKey> impl for sec1::EncodedPoint to encode points according to curve-specific conventions.

For now those preferences are uncompressed points for p256 and p384, and compressed points for k256 (i.e. secp256k1), based on the most common real-world usages.

However, the sec1::ToEncodedPoint trait still provides fine-grained control with an explicit compress: bool option.

jedisct1 commented 2 years ago

However, the sec1::ToEncodedPoint trait still provides fine-grained control with an explicit compress: bool option.

So, in order to add wasi support to that function, you'll have to deserialize and reserialize if the compress parameter doesn't match what the hostcall returns.

Which means that libraries have to keep including their serializers/deserializers even if the hostcalls also implement this. Just to work around a limitation of the hostcall API.

sonder-joker commented 2 years ago

Here's a proposal:

  • We get rid of the compressed variants
  • For serialization, for each key type, we document what the output format is. The entire column can say Uncompressed for now, but that gives us some headroom to change this for future algorithms in a backward-compatible way
  • We provide pseudo-code to convert an uncompressed representation to a compressed one in an appendix.

What do you think?

I think it's pretty good.