Unify hash_to_field for small primes modulus.

armfazh commented 4 years ago

I consider we can do only a slight modification on hash_to_field to support prime modulus. Namely modifying the amount of bytes requested to the expand_message function. So we can avoid the use of an alternate method.

kwantam commented 4 years ago

This is a great idea, but we have to be careful for a couple reasons:

Any change we make to address this issue really really shouldn't change hash_to_field unless absolutely necessary, because we'd be breaking existing implementations (of which there are a bunch at this point!) just to address a corner case.
It's not totally clear to me that it's possible to address this satisfactorily by requesting fewer bits from expand_message. The rest of my message discusses. (Some of this rehashes what I wrote in an email a few days ago.)

TL;DR: I don't see a good way to give a unified hash_to_field that conveniently handles both small- and large-characteristic extension fields.

We can regard hashing to GF(p^m), m >= 1 as selecting a random integer in the half-open range [0, p^m). When discussing uniformity, it's handy to think of this integer as comprising m digits in base p.

To sample an integer in [0, p^m) with statistical distance from uniform roughly 2^-k for arbitrary p without rejection sampling, we have a few choices:

Select m values in [0, p) with statistical distance from uniform roughly 2^-k. At least for small m, this gives an integer in [0, p^m) with statistical distance roughly 2^-k. This is the approach that hash_to_field uses: reduce a bunch of (k + log p)-bit integers mod p.
Select one value in [0, p^m) with statistical distance from uniform roughly 2^-k. This is the approach that hash_to_field_divrem (formerly proposed in #282) uses: reduce a (k + m log p)-bit integer mod p^m.
Some interpolation between the above approaches. For example, we might select n integers in [0, p^(m/n)) with statistical distance from uniform roughly 2^-k, then regard each as comprising m/n digits in base p. This would require reducing (k + (m/n) log p)-bit integers mod p^(m/n).

It's also worth mentioning a class of approaches that do not work. It's tempting to think, for example, that since (k + m log p) bits is enough, we can hash to m digits by reducing m, (k/m + log p)-bit integers mod p. But this doesn't work---we get statistical distance upper bounded by 2^-(k/m) for each digit, which is far too big!

Of the above approaches, option (2) is the most efficient from a hashing standpoint---it only requires k + m log p bits, which is k * (m - 1) fewer bits than option (1).

On the other hand, when log p >= k, option (1) is far easier to implement. The reason is, multiplying two (log p)-bit integers requires reducing a (2 log p)-bit intermediate value mod p---so we can reuse the same modular reduction code when log p >= k, i.e., when 2 log p >= k + log p.

When k is much greater than log p, option (1)'s simplicity advantage vanishes, because k + log p is significantly bigger than 2 log p, which means we'd need to implement a special modular reduction routine anyway. At this point, it seems like option (2) is better. (Note that I'm making the simplifying assumption that options (1) and (2) require similarly difficult implementations, which is not strictly true. But I don't see how to make finer distinctions without getting bogged down in details.)

So it seems like, for curves over high-degree extensions of small-ish characteristic (e.g., GF(9767^19)---which is a pretty extreme example), we want a fundamentally different approach than for curves over low-degree extensions of large-ish characteristic. I don't see another way to reconcile this, because option (2) makes implementation really ugly when log p is big. For example, when hashing to GF((2^581)^8) (which one would do when hashing to G2 of BLS48-581), option (2) requires DIVREM to handle roughly 4900-bit integers (!!!), which seems like a non-starter.

If we write down option (2) in the appendix as I've proposed in #282, we could add a subsection discussing option (3), which is easily implemented using option (2) as a subroutine. For the case of GF(9767^19), for example, one could hash to 2 elements of GF(9767^10) and convert that into an element of GF(9767^19) simply by concatenating the vector representations and throwing one vectory entry away.

The reason to consider this approach is that it limits the size of the input that DIVREM needs to handle, which may make implementing DIVREM easier. For hashing to an element of GF(9767^19) in one shot, DIVREM needs to handle 394-bit inputs. By instead hashing to two elements of GF(9767^10), DIVREM only needs to handle 268-bit integers.

(Note that I haven't thought carefully about how to implement DIVREM nicely, though, so maybe the difference between 268 bits and 394 bits is not worth the extra 140ish bits of hash output that it requires.)

kwantam commented 4 years ago

I opened #286 to make the proposed alternative method concrete. We can continue discussing here or discuss there as you like! @chris-wood @armfazh

armfazh commented 4 years ago

Resolved in #287

cfrg / draft-irtf-cfrg-hash-to-curve

Unify hash_to_field for small primes modulus. #284