respond to comments from Thomas Pornin

kwantam commented 4 years ago

This PR includes responses to most of the comments from Thomas Pornin's review for the crypto panel.

In the next comment in this thread I will quote Thomas's email and my responses to each point. There are a few cases (marked "???") where it's either not clear to me what the comment meant or it's otherwise not obvious how to respond. Let's discuss in this thread and I'll make edits as necessary.

kwantam commented 4 years ago

A generic remark: there is not a word about intellectual property issues. I understand that this is a contentious issue, and that nobody wants to assert that any specific method is patent-free. But I would have expected a disclaimer somewhere. Otherwise, it feels weird that Icart's map is not used as a possible method, since, when p^m = 2 mod 3, it should be faster than the simplified SWU (it uses one inversion and one cube root, no extra is_square() or similar). (Also, I heard some rumours that simplified SWU is also patented, albeit with a less proactively enforced patent. The same rumours say that the original Shallue-van de Woestijne method is patent-free. These are only rumours.)

???

I think the IPR disclosures handle this. Should double check whether this is true, and what the policy is regarding IPR discussion in the draft itself. (I seem to recall that this is discouraged.)

2.1 (page 6): Maybe note that there are multiple choices for the representation of GF(p^m) as polynomials: any irreducible modulus M (of degree m) is fine (all finite fields with the same cardinal are isomorphic to each other), but no M is more canonical than any other. Choice of M is often driven by implementation performance (some M lead to faster code).

???

Not clear that this is within the scope of the document, since right now we're not talking about the modulus M at all.

2.1 (page 6): "elements are those points with coordinates (x, y) satisfying the curve equation": some curves (in particular Weierstraß and Montgomery curves) also include a special point-at-infinity which does not have coordinates (x, y).

Done.

2.2.2 (page 8): "Section 10 discusses further." -> This lacks a complement. Maybe: "See section 10 for further discussion."?

Done.

2.2.5 (page 9): Concatenation can be ambiguous, i.e. "RO10" || x == "RO1" || ("0" || x) The actual usage (in section 5.3) uses explicit length bytes, and also puts the tag after the message, not before. It might be worth mentioning here that the "concatenation" operation must be an injective encoding of the tag and input x, with a forward reference to section 5.3 for details (especially 5.3.4).

Done.

3.1 (page 12): The text fluidly and implicitly equates character strings and byte strings. Java and C#/.NET developers might take issue with that. This might be worth an explicit sentence here; otherwise, some developer somewhere will use UTF-16 (and calling it "Unicode" in pure Microsoft fashion), others will add a terminating zero, or use big-endian, or add a BOM, or any combination thereof, and much confusion and interoperability sorrow will ensue.

Done.

4 (page 12): "SHOULD be constant time" -> "SHOULD be constant-time".

"constant time" should be hyphenated only when it's used as an adjective. The hyphen in "constant-time X" is to clarify that "constant" modifies "time," not "X." I've swept the document for consistency.

Also, the traditional acception of "constant-time" covers more than independence of the execution time with regard to the input values:

It really is independence with regard to all secret values, including intermediate values and outputs, not just inputs.

Constant-time code should not leave any trace that can be leveraged with time-based measurements. For instance, in a flush+reload attack, the victim's execution time is independent of the secret values; the timing measurement is on a reload operation performed by the attacker themselves, after the victim has finished. Code which is vulnerable to a flush+reload attack may still be "constant-time" in the sense described by section 4, but would not be "constant-time" in the usual sense.

I think a more generic definition of "constant-time" is needed here. I suggest the following: "For security, implementations of these functions SHOULD be constant-time, i.e. none of the possible timing measurements that can be effected by an adversary should depend on secret values. In particular, execution time but also memory access patterns should be independent of any secret input, output or intermediate value."

Done, with a slightly trimmed-down version of the suggested text.

4 (page 13): "there exist two roots of x in the field F whenever x is square." -> except when x = 0. Zero has a unique square root in F.

Done.

4 (page 13): "To implement inv0 in constant time, compute inv0(x) := x^(q-2)" -> Depending on the field, there may be (much) faster ways, in particular in field extensions (e.g. in Curve9767, inversion cost is about 6M, while Fermat's Little Theorem would bring it to about 300M!). Even in prime fields, a constant-time extended binary GCD will typically have cost between 40M and 100M for inversion (but some specific care is required to ensure that inv0(0) = 0). However, it must be said that the exponentiation is a simple to implement in constant-time if constant-time multiplication is available (and it should, otherwise the whole thing is hopeless).

Good point. Added a note saying that this is possible, but further discussion is beyond the scope.

4 (page 13): It may be worth mentioning that I2OSP and OS2IP, as defined in PKCS#1 (aka RFC 8017), are big-endian. In particular, most of the curve25519/edwards25519 ecosystem tends to be little-endian, so that's a plausible source of mistakes.

Done.

5 (page 5): The draft uses "SHAKE-128" as the name of SHAKE with a 128-bit output, but the NIST specification (FIPS 202) defines "SHAKE128" without the dash. This impacts the use of the name in IDs (e.g. in section 8.10, page 43).

Done.

5.2 (page 17): Maybe note somewhere that implementation of "mod p" with constant-time code is not completely obvious. Using Euclidean division is not constant-time. Most constant-time implementations use Barrett reduction, although some use Montgomery instead.

???

We note that this is needed in Utility. Should we note again here?

5.2 (page 17): When mapping to a field GF(p^m) with a relatively small p (e.g. less 32 bits), then the proposed hash_to_field() is rather wasteful in hash output. For such fields, generating k+log2(p^m) bits and then repeatedly dividing that big integer by p will be faster, especially when working on embedded systems. On an ARM Cortex M0+, with field GF(9767^19), I can get a map_to_field() in about 20000 cycles, while each invocation of the SHAKE internal function costs 34000 cycles and the proposed method would require three invocations of that internal function to get enough bytes.

Added appx discussing the alternative function.

5.3.1 (page 18): I needed to verify, but "Damgaard" is indeed the correct ASCII-compatible alternative spelling of "Damgård". Nice job!

:)

5.3.1 (page 19): Definition of b_in_bytes uses ceil(), i.e. accounts for the possibility that H outputs something else than an integral number of bytes. But if that is the case, the "uniform_bytes" are not uniform! It would be better to make it an explicit requirement that the output length of H, in bits, MUST be a multiple of 8.

Done (thanks to @armfazh's clarification :+1:)

5.3.1 (page 19): r_in_bytes is said to be the "block size" of a hash function, but that notion has not been defined. Generally speaking, some hash functions do not have a well-defined "block size". You may fall back here on HMAC, since HMAC also needs a "block size"; for instance, FIPS 202 defines the block size of SHA3-256 to be 136 bytes.

Done.

5.3.4 (page 21): "prepended or appended" -> technically, "prepended" is an actual English word, but not with that meaning. It means something like "premeditate". I suggest: "an encoding of DST MUST be added either as a prefix or suffix to the input to each invocation of H (a suffix is the RECOMMENDED approach)."

Done, and swept the document for other uses of prepend.

6.6.1 (page 25): It may be worth mentioning that some of the computed values are actually constants, e.g. tv4 (step 5) or -4g(Z)/(3Z^2+4*A) (used in step 10) (this is applied in appendix D.1). In effect, the "expensive" operations in this algorithm that cannot be precomputed are 1 inv0(), 2 is_square(), and 1 sqrt(). This makes it apparent that the speed advantage of the simplified SWU map is that it saves 1 is_square() call, i.e. about 25% of the overall cost if inv0(), is_square() and sqrt() are implemented with modular exponentiations.

Done.

10 (page 45): for password hashing, discussion points to PBKDF2 and Scrypt. It might be worth mentioning Argon2, for which there is an in-progress draft: https://datatracker.ietf.org/doc/draft-irtf-cfrg-argon2/

Done.

10 (page 45): "the nature of the leakage and the appropriate defense depends on the application." -> should be "depend" here.

Done.

E.1 (page 75): It may be worth mentioning here that the random oracle mapping requires doing two map_to_curve() and adding the results together, and this is, in itself, enough to make projective coordinates useful. In particular, when using a Weierstraß curve, you must take into account the (remote) possibility that the two points may be the same, therefore requiring a constant-time generic point addition routine; there are complete formulas for Weierstraß curves in homogenous projective coordinates.

Done.

G.5 (page 96): "Other optimizations of this type are possible in other even-order extension fields" -> You can also have similar optimizations in any extension field. In general, for an odd p and an arbitrary m > 1:
  (p-1)/2 = ((p-1)/2)*(1+p+p^2+...+p^(m-1))
Thus, if we call r = 1+p+p^2+...+p^(m-1), we have:
  x^((p-1)/2) = (x^r)^((p-1)/2)
This can be computed very efficiently because:

x^r is an element of GF(p) (indeed, (x^r)^(p-1) = 1 for any x != 0, and only elements of GF(p) are roots of the polynomial X^(p-1)-1 in GF(p^m)[X]), so the final exponentiation by (p-1)/2 can be done with only values in the small field GF(p).

x^r can be computed in log(m) multiplications and log(m) applications of the Frobenius operator:

t1 = x x^p = x^(1+p) t2 = t1 t1^(p^2) = x^(1+p+p^2+p^3) t3 = t2 * t2^(p^4) = x^(1+p+p^2+p^3+p^4+p^5+p^6+p^7) ...

The Frobenius operator itself is typically as efficient, or even faster, as a multiplication, since it is a linear transform over the source polynomial (seen as a vector in a space of dimension m over GF(p)). If GF(p^m) was defined modulo M = X^m-c for some constant c, then the Frobenius operator is a simple per-coefficient multiplication, i.e. a cost close to that of an addition in GF(p^m).

The algorithm in appendix G.5 is a simple sub-case of this, when m = 2.

???

For now, just made clear that this is not restricted to even-order extensions. Is it worth documenting this trick here, should we write it down in a separate document, is there a cite we can add, or should we do something else?

kwantam commented 4 years ago

Ah, I also made an edit that closes #281 in this PR, clarifying that the reason for MUST NOT is that rejection sampling rules out (reasonable) constant-time impls. Constant-time impls aren't required, but they should always be possible.

chris-wood commented 4 years ago

I agree with all of your responses, @kwantam! Below are my thoughts on the unclear feedback items.

I think the IPR disclosures handle this. Should double check whether this is true, and what the policy is regarding IPR discussion in the draft itself. (I seem to recall that this is discouraged.)

Let's follow up with the chairs and see what they'd like us to do?

Not clear that this is within the scope of the document, since right now we're not talking about the modulus M at all.

Maybe we can just note that our choice of basis does not change as a function of the irreducible? (That is, we always use a polynomial basis, rather than, say, a normal basis, regardless of the polynomial.) I agree this is not really in scope, but maybe clarity here doesn't hurt.

For now, just made clear that this is not restricted to even-order extensions. Is it worth documenting this trick here, should we write it down in a separate document, is there a cite we can add, or should we do something else?

I'd say let's leave it out for now. Optimizations at this stage seem to be scope creep.

kwantam commented 4 years ago

Can we remove the hash-to-field change and discuss that in a separate PR, @kwantam?

Yes! I'll do this and respond to your other feedback later tonight or tomorrow. Sorry for the delay; need to take care of some other stuff for a little while.

kwantam commented 4 years ago

OK, I've rewritten this branch so that it does not include the alternative hash_to_field function. We can discuss in #284 and then pull in the commit or not as we decide.

cfrg / draft-irtf-cfrg-hash-to-curve

respond to comments from Thomas Pornin #282