dalek-cryptography / curve25519-dalek

A pure-Rust implementation of group operations on Ristretto and Curve25519
Other
886 stars 449 forks source link

Misleading description for hash-to-Edwards point #400

Closed pornin closed 1 year ago

pornin commented 2 years ago

The function EdwardsPoint::hash_from_bytes() is described as "performing hashing to the group" and explicitly references draft-irtf-cfrg-hash-to-curve:

https://github.com/dalek-cryptography/curve25519-dalek/blob/8abb22bcafe30ac2dbad372d0444fdba9e63bc0e/src/edwards.rs#L529-L532

This is a rather misleading description because:

The first point is mostly about interoperability; while the map is not the same as in the draft, it is still safe (in particular, this code just uses the first 255 bits of a hash output as a field element, while the draft would generate 129 extra bits and perform a modular reduction; but in Curve25519, the field modulus is very close to 2^255, so the bias from using just 255 input bits is negligible). The second point is, arguably, more important for security, because the current hash_from_bytes() produces points with a non-uniform distribution that is easy to distinguish from a uniform random generation. Most protocols (and in particular the Signal stuff that this implementation follows) don't mind, but some of them require a uniform distribution, and the lack of uniformity can have consequences ranging from invalid security proofs to actual information leaks.

I recommend, at least, modifying the documentation to make it explicit that the implemented functionality is not compatible with draft-irtf-cfrg-hash-to-curve, and that it has a non-uniform output distribution, which may be troublesome for some (admittedly not many) protocols.

Looking at the existing pull requests, I see PR #377, which not only modifies the description of hash_from_bytes(), but also implements (in new, additional functions) the draft-irtf-hash-to-curve process. I have not looked in details at that PR, but the idea seems good.

rozbb commented 1 year ago

I don't like that these function is here at all, actually.

I'd like to remove these functions entirely, unless someone can find a non-broken upstream use. If removal isn't possible, I'd like to feature-gate it behind hazmat or something.

Plan for removal: What I'd like to do is deprecate this in 4.0 and remove it in 4.1. But this technically breaks semver. So maybe a better route is to feature-gate it behind deprecate-in-4.1 or something.

@tarcieri thoughts?

tarcieri commented 1 year ago

We’ve done a decent amount of work in the @RustCrypto elliptic-curve crate trying to implement curve-agnostic traits for the hash2curve APIs, although they’re still rough around the edges and I wouldn’t recommend them as anything more than an optional dependency, especially as the elliptic-curve crate is pre-1.0. But it may be good to expose inherent methods which try to align with those APIs.

My best guess is hash_from_bytes predates the RFC and is a nonstandard construction. If it doesn’t appear to have users it can probably be safely removed.

tarcieri commented 1 year ago

Also you can feature gate them behind a feature which is explicitly called out as exempt from semver, perhaps complete with a deprecation warning.

The same thing could be done with the elliptic-curve APIs too: they could be gated behind a hash2curve or hash2curve-unstable feature if we wanted to conditionally impl them.

burdges commented 1 year ago

elligator_ristretto_flavorappears to be a separate beast entirely.

rozbb commented 1 year ago

Ugh you're right, the Ristretto version also does not appear to implement the Elligator standard (it only does half the map). I think this should also be deprecated. We could pretty easily re-introduce elligator2 methods in a future major release. The RFC has lots of test vectors

burdges commented 1 year ago

There were good reasons to expose the original singleton elligator map for ed25519, no?

It's not exposed for ristretto now anyways, likely because there are no ristretto protocols you wish to resemble. I simply meant the ristretto version does not depend upon the ed25519 version.

In ristretto, the actual hash from_uniform_bytes adds two elligator calls, so just add the RFC test vectors into the code. Right now, the code uses elligator test vectors, while the RFC uses elligator2 test vectors. It should all be fine.

rozbb commented 1 year ago

I'd like to release before fixing this issue though. We can deprecate these functions but you bring up an important question: do we need a new name for the functions that will replace these? Is that a question for later?

burdges commented 1 year ago

Ignore my ed25519 question above..

Afaik, there are no problems with the Ristretto code, so afaik nothing there should be deprecated or changed now or in the future, just maybe adding the RFC test vectors eventually.

rozbb commented 1 year ago

The Ristretto code appears to only be doing half of the one-way map defined in the RFC (ie it should be running MAP twice, and summing the results). We could retroactively fix it, but this would cause subtle breakage. So what I'd rather do is deprecate this, put it behind a feature flag, and introduce the correct function later.

I'm wrong. It appears to be doing the correct thing. I should add the real test vectors to make sure.

rozbb commented 1 year ago

Fixed my point above. A separate point is that these hashing functions do no domain separation, despite the RFC mandating it. Fixing it would be breaking, but leaving it would be Not Good. Hmmm

burdges commented 1 year ago

You can remind people in comments, but clearly from_uniform_bytes cannot do domain separation, and it has a lot of users.

pornin commented 1 year ago

RistrettoPoint::from_uniform_bytes() implements the "one-way map" which is defined in the ristretto draft spec (that is probably going to be a published RFC soon). It is fine. The hash-to-curve draft (appendix B) defines hash_to_ristretto255() as the combination of an expand_message() call (which is basically a domain-separated XOF) and the one-way map from the the ristretto draft; in that sense, curve25519-dalek implements (correctly!) the second half of hash_to_ristretto255() (under the name from_uniform_bytes()).

My issue here was specifically about EdwardsPoint::hash_from_bytes(), which is completely separate from the ristretto code. My beef with hash_from_bytes() is twofold: that it wrongly documents that it follows the map described in the referenced hash-to-curve draft (it implements something close, but not that exact one), and that it uses the word "hash" despite having a non-uniform output. Normally you could fix that with only a documentation change, i.e. modifying the function description into something that explicitly states that it implements a map with a non-uniform output which is similar in principle (but not identical) to the Elligator2 map.

As for deprecating: since hash_from_bytes() does not follow any pre-existing standard (draft or not), it probably does not have many users, though I think that it has been used for some Signal-related purposes. You'd have to ask Trevor Perrin about that. Marking the function as deprecated is orthogonal to the question of fixing its documentation, though.

jrose-signal commented 1 year ago

I can get in touch with Trevor if you like, but if we did use EdwardsPoint::hash_from_bytes I'm pretty sure we no longer do; our current "hash from bytes" operations are in lizard_ristretto.rs, plus the aforementioned RistrettoPoint::from_uniform_bytes. We don't use EdwardsPoints directly except in our slightly customized implementation of Ed25519 (maintaining compatibility with the implementation in the old libsignal-protocol-java).

(It'd be neat to get those upstreamed but I do not myself have a cryptographer background and can't say whether they're generally useful operations, or too specific to what Signal / zkgroup is doing.)

rozbb commented 1 year ago

That's good to know. Hopefully this means the deprecation won't be noticed. Thank you!

rozbb commented 1 year ago

Resolved in #438