Comparison to "naive" hashToCurve

crockeea commented 3 years ago

Disclaimer: I know very little about elliptic curves.

The draft gives many long and complex algorithms for hashing arbitrary bits to a curve point. However, this much simpler algorithm seems to work for any curve as well:

Pick a public point G
Pick a public hash function H (or perhaps HKDF if you need additional bits)
Hash a byte[] x to a curve point via H(x)*G

This seems to be an efficient algorithm, and results in a uniformly random curve point. Is this a valid alternative to the algorithms proposed in this draft? It's not clear from the draft what properties the algorithms in the draft have that my proposal does not.

armfazh commented 3 years ago

This approach is basically covered in Appendix A.

   A naive but generally insecure method of mapping a string msg to a
   point on an elliptic curve E having n points is to first fix a point
   P that generates the elliptic curve group, and a hash function Hn
   from bit strings to integers less than n; then compute Hn(msg) * P,
   where the * operator represents scalar multiplication.  The reason
   this approach is insecure is that the resulting point has a known
   discrete log relationship to P.  Thus, except in cases where this
   method is specified by the protocol, it must not be used; doing so
   risks catastrophic security failures.

crockeea commented 3 years ago

I see, thanks for that pointer. Do you know of a reference that discusses "known discrete log relationships" (or this naive approach) in more depth?

burdges commented 3 years ago

It's inverted scalar multiplication by definition, H(x) is the discrete divisor aka discrete log of H(x)*G over G.

Actually "risks catastrophic security failures" is an understatement: All protocols that require hash-to-curve becomes trivially insecure when one makes this blunder.

crockeea commented 3 years ago

All protocols that require hash-to-curve

How do you know when a protocol requires hash-to-curve? I'm looking for the property (a requirement on the hash function itself, perhaps?). It's not clear to me when a protocol needs hash-to-curve, and moreover, how that requirement might be stated. There doesn't seem to be a corresponding property claimed in this draft, either.

burdges commented 3 years ago

It depends what "hash" means..

Although not a hash-to-curve, x -> H(x)*G provides a preimage and second-preimage resistant "hash" that yields a curve element. I've only seen applications like Merkle proofs inside zkSNARKs, in which we say Pedersen hash, or in backdooring random number generators, so really they map from the Z/qZ to Z/pZ. We do generate nonces this way in derandomized signatures like Ed25519 but then we care about the scalar, not just the point.

We say hash-to-curve anytime one makes stronger assumptions about a function H' : {0,1}^* -> E where E is the curve, like that H' is a random oracle or a PRF. If H'(x) = H(x)*G even with H a random oracle then H belies H' being a PRF, meaning the adversary who queries H first knows what your doing. You might instantiate H' similarly inside a security reduction of course, just not if the adversary can access H.

Anything like BLS, IBE, PAKEs, VRF, and some VDF becomes trivially insecure. You'll never "hash" this way, even the zkSNARKs uses are being replaced. Also x -> H(x)*G has about the worst performance imaginable for a hash function.

crockeea commented 3 years ago

@burdges That's very helpful, thanks!

Getting back to the draft: I think it would be useful to explain some of this in the draft. The intro only says:

Many cryptographic protocols require a procedure that encodes an arbitrary input, e.g., a password, to a point on an elliptic curve. This procedure is known as hashing to an elliptic curve.

It seems to me that the naive algorithm also achieves this goal. I think it would be good to emphasize the applications which require hash-to-curve.

bytemare commented 3 years ago

Note that scalar multiplication ( s * G ) is considered expensive, and tends to be avoided if possible. If that helps :)

chris-wood commented 3 years ago

@crockeea the draft discusses this in Section 8:

Applications instantiating cryptographic protocols whose security analysis relies on a random oracle that outputs points with a uniform distribution MUST NOT use a nonuniform encoding. Moreover, applications that use a nonuniform encoding SHOULD carefully analyze the security implications of nonuniformity. When the required encoding is not clear, applications SHOULD use a uniform encoding for security.

Perhaps this could be made more prominent in the draft. But the gist is: use hash-to-curve if you don't know what you're doing.

crockeea commented 3 years ago

@chris-wood Doesn't the naive algorithm also achieve a uniform distribution? I think I'm beginning to understand that the difference is not the distribution, rather it's the "random oracle" part: the naive algorithm cannot be used in a protocol that requires a random oracle, but hash-to-curve can (as stated in section 2.2.3). Is that right?

burdges commented 3 years ago

It's called the Pedersen hash. And one usually interprets it as mapping to the base field, not the group. A group action like x -> H(x)*G is not an algorithm for anything discussed in this repo.

I said random oracle only to exposes the issue simply, but random oracle does not characterize the problem, a hash-to-curve need not be a random oracle, other formulations exist, and nobody knows if random oracles exist.

It's actually x -> H(x)*G that describes the problem: All hash assumptions discuss the absence of structure. All "hashes" with algebraic structure like Pedersen hash, SWIFFT, etc. require you carefully check the assumptions you require against whatever properties folks proved the hash provides.

As an aside, ZCash shall deprecate Pederson hash for Sinsemilla, which provides similar security more efficiently in Plonk.

I'd close this issue honestly..

chris-wood commented 3 years ago

the naive algorithm cannot be used in a protocol that requires a random oracle, but hash-to-curve can (as stated in section 2.2.3). Is that right?

Correct! And we clarify that in the text quoted above. Would further clarifications help? If so, could you please send a PR? I'd also be happy closing this without any action.

crockeea commented 3 years ago

Would further clarifications help?

I do think further clarification would help, especially in the abstract and intro. What's missing for me is why I should use the algorithms in this RFC. The intro says

Unfortunately for implementors, the precise hash function that is suitable for a given protocol implemented using a given elliptic curve is often unclear from the protocol's description. Meanwhile, an incorrect choice of hash function can have disastrous consequences for security. This document aims to bridge this gap by providing a comprehensive set of recommended algorithms for a range of curve types.

but it does not explain what specific cryptographic property is satisfied by the algorithms in this draft that is needed by "many protocols". My understanding now is that the algorithms in the draft can be used as a hash function in a protocol which is proven in the RO model. Currently, neither the abstract nor the intro make any mention of random oracles, but they should.

Further down:

The reason this approach is insecure is that the resulting point has a known discrete log relationship to P. Thus, except in cases where this method is specified by the protocol, it must not be used; doing so risks catastrophic security failures.

I'm having to read between the lines here. The unstated implication (seems to be)

The reason this approach is insecure is that the resulting point has a known discrete log relationship to P and therefore cannot be used in protocols that require a random oracle.

Unfortunately, I do not know enough about the details, especially around the language of random oracles, to provide text for you.

kwantam commented 3 years ago

Thanks for the feedback. This is super useful.

But I want to push back slightly on the idea of describing the relevant property in terms of "random oracles" in the intro, for two reasons. First, "random oracle" doesn't seem like the right abstraction for the introductory material. Second, it's not actually the relevant property, as @burdges says. There are plenty of protocols that are secure without resorting to the random oracle model, but insecure with hash-to-exponent/Pedersen hash. BLS signatures are one example.

The object we need is called a hash function to an elliptic curve in the cryptographic literature. Hash-to-exponent is just not such a function. It may make sense to explain that fact in the intro, but it shouldn't be done with shorthand that is subtly wrong (which would be the case if we were talking about random oracles), because that shorthand might convince a literal-minded reader that they don't need a hash to curve when they actually do.

We could define it more precisely, presumably by saying something about collision resistance and infeasibility of computing discrete log of the resulting point (provided that discrete log is hard in the target curve). We might have to handwave a tiny bit to avoid having to write down a security game, but to me that's preferable to saying something that is clearly wrong.

cfrg / draft-irtf-cfrg-hash-to-curve

Comparison to "naive" hashToCurve #300