ciphersuites and domain separation

kwantam commented 5 years ago

Chris and I discussed the issue of ciphersuites and domain separation in hash-to-curve. Put concisely, the question is, should hash-to-curve provide domain separation among the suites, or should domain separation be left to the upper-level protocol? We decided that the best way to proceed is to get feedback from the CFRG list by coming up with and sending two specific proposals along with a discussion of pros and cons.

From chatting with the BLS and VRF folks, my impression is that they prefer to keep ciphersuites out of hash-to-curve. Their argument is (paraphrasing), the upper-level protocol needs to ensure domain separation, so what's the point of doing it redundantly in hash-to-curve? There was also serious concern about variable-length ciphersuite strings (currently used in the poc impls but not specified in the document), because that is a potential source of confusion and bugs. Right now, the BLS sigs standard is proceeding with no ciphersuite in hash-to-curve, only in the BLS signature itself.

But there may be room to meet in the middle. For example, it would be essentially free to add a short, fixed-length ciphersuite (say, <20 bytes) in hash2base. It might be worth running this by the BLS and VRF folks before we go to the list, to hear out the likely objectors first and see if we can get buy-in.

Concretely, I propose that we consider the following two options.

option 1: no ciphersuite

This preserves the current version of hash2base as specified in the document. The important line is the one in which msg gets hashed. In the current version, that's

m' = H(msg) || I2OSP(ctr, 1)

option 2: fixed string + 4-byte ciphersuite in hash2base

m' = H("HASH-TO-CURVE" || ciphersuite || msg) || I2OSP(ctr, 1)

EDIT: or, maybe slightly preferable so that "prehash-for-free" still works:

m' = "HASH-TO-CURVE" || ciphersuite || H(msg) || I2OSP(ctr, 1)

EDIT 2: I think it's safe to assume that "ciphersuite" here would be generated following something like the procedure in this comment, below. This avoids issues with ciphersuites for curves not discussed in the spec document.

Note that the string "HASH-TO-CURVE" ensures domain separation even from other protocols that use a 4-byte ciphersuite tag. It might be nice for protocols to adopt this approach more genreally.

(Actually, I kind of like the idea of using the RFC number (e.g., "RFC1234" for RFC 1234's domain separation string), but that has the downside that the protocol's test vectors can't be finalized until the RFC number is assigned.)

Thoughts on the above two proposals? Any other issues we should consider?

chris-wood commented 5 years ago

If we assume a fixed-length value for the cipher suite, e.g., a 2 or 4 byte value that maps to a ciphersuite and is maintained by a registry, I think I'm more comfortable with option 2. While it may be redundant, it seems as though it would help prevent possible misuse or failure to add domain separation by the calling code or protocol. @armfazh @grittygrease @samscott89, please chime in!

armfazh commented 5 years ago

I will go for not including domain separation inside the definition of the suite.

Hash-to-curve functions must be as generic as possible. Note that the interface receives a string of opaque bytes. Then, the job of the caller is to produce a meaningful string according to him/her purposes. As a consequence, providing or not a domain separation string must be decided by the top-level protocol.

I also consider, the document must provide recommendations about the usage of the suites, providing advise on the cases where domain separation is necessary.

I might not be considering all the scenarios, feedback is appreciated.

chris-wood commented 5 years ago

As a point of comparison, HPKE [1] bakes the cipher suite into the context string used inside the Seal and Open functions. I'm not sure doing something similar here, with a fixed-length suite, would negatively impact generality, especially since the ciphersuite already uniquely determines the set of algorithms used to construct hash_to_curve. In other words, including the suite provides further separation between hash_to_curve implementations based on different ciphersuites.

[1] https://tools.ietf.org/html/draft-barnes-cfrg-hpke-00#section-5.1

kwantam commented 5 years ago

I thought about this a bit more over the past few days, and I tend to think that domain separation in hash-to-curve will not do much for security in practice; I explain below. But as @chris-wood says, if it doesn't hurt performance or generality, I'd tend to err on the side of caution, with two caveats. First, if we include domain separation, we should also do as @armfazh suggests and add either language or a citation that recommends domain separation strings in upper-level protocols; otherwise, the worry is that we'd give users the incorrect impression that domain separation at the protocol level was unnecessary. Second, we need to make sure we have an interop story for curves that are not in the "official" ciphersuite table; see my comment immediately below.

(Come to think of it: are there any CFRG guidelines or informational documnents that talk about domain separation? Should there be? It might be nice if there were a uniform way of doing domain separation, and if every CFRG protocol just did it that way.)

OK, why should we think it won't do much? Recall that one reason to use domain separation is to make sure that random oracle security proofs still hold when protocols are composed---in particular, that composition doesn't break the freshness of random oracle queries. Concretely, if protocol A's security proof relies on making a fresh random oracle query on input X, and an attacker can force composed protocol B to make a query on input X first, the security of the composition may be broken.

Now, let's think about a scenario where we're composing two protocols that both use hash-to-curve, and in which neither protocol uses domain separation strings. In this case, whether or not domain separation at the hash-to-curve level helps depends on whether or not the two protocols use the same hash-to-curve suite. Specifically, if the two protocols use different curves, h2c domain separation would help them. But if they use the same curve and h2c suite, it wouldn't.

So: if we think that composed protocols would end up using the same suite in almost all cases, then we should probably conclude that h2c domain separation won't do much. My inclination is to believe that implementors will prefer to use a single hash-to-curve codebase, i.e., that they'll actively try to use a common suite when composing protocols---which defeats the h2c domain separation.

But like I said, if adding separation is really free, then maybe there's no reason not to do it...

kwantam commented 5 years ago

On the question of whether h2c is really free, I have a concern: if domain separation includes an opaque ciphersuite ID from a table, there is the possibility that interop will be broken for curves/suites that aren't in the table.

I think this qualifies as a real downside, and it's something we have to think carefully about if we include suite IDs in hash2base.

Along these lines, if we're going to have a ciphersuite table in the RFC, how and when does that table get updated with new suites? Would it make sense to run an informal registry of not-yet-standardized suites, e.g., on GitHub? (I see this ties into #126)

chris-wood commented 5 years ago

Come to think of it: are there any CFRG guidelines or informational documnents that talk about domain separation? Should there be? It might be nice if there were a uniform way of doing domain separation, and if every CFRG protocol just did it that way.

Interesting idea! At the very least, it seems like something worth discussing in Montreal. Shall we put down something in writing?

kwantam commented 5 years ago

Interesting idea! At the very least, it seems like something worth discussing in Montreal. Shall we put down something in writing?

Oh, good idea! Let me think a bit about this. (But: is it reasonable to assume that the in-writing bit is lower priority than getting this draft updated?)

chris-wood commented 5 years ago

is it reasonable to assume that the in-writing bit is lower priority than getting this draft updated?

Absolutely!

kwantam commented 5 years ago

After thinking about this for a few more days, I still don't see a clear path to a good interop story for curves that aren't enumerated in the document. Because of that, I'm worried that adding a ciphersuite tag will hurt the interop story for hash-to-curve without a compensating improvement in security (for reasons given in my prior comments).

So: I think my preference is tending towards adding language that encourages upper-level protocols to add domain separation, but not to add it in hash-to-curve.

I've asked the BLS and VRF standards authors for their thoughts; I'm hopeful that they'll come and add comments to this issue.

reyzin commented 5 years ago

I tend to agree with @kwantam. The security benefits of domain separation at the hash-to-curve level are unclear to me, but the logistical drawbacks are. If you add ciphersuites to this draft and someone wants to use hash-to-curve with a new hash function or a new curve, they would have to figure out the IETF process for adding a new ciphersuite to this standard, in addition to whatever other standardization and implementation efforts they are already undertaking. Increasing logistical barriers means some people just won't bother and will choose their own options, ignoring this process.

I think the value of hash-to-curve draft is that it covers a broad range of use cases, hash functions, base fields, and curves. Adding ciphersuites will reduce this value.

Also, adding ciphersuties will make existing implementations already deployed in the wild (such as implementations of the VRF draft) incompatible.

hoeteck commented 5 years ago

I also tend to agree with both @kwantam and @reyzin, namely that ciphersuite and domain separation should be provided/enforced by the high-level application.

Here's a slightly different take on the issue: do we expect a ciphersuite string in hash-to-curve to contain any additional information that's not already in the ciphersuite string for the high-level application (e.g. BLS signatures)? To me, that shouldn't be, because the latter should completely determine which hash-to-curve algorithm will be used. If so, I don't see any advantage to having the same information appear twice; if anything, there's only disadvantages, namely aesthetics, and as @reyzin pointed out, logistic drawbacks to maintaining consistency.

Side note / clarification: I think it'd still be useful for hash-to-curve to specify a table of ciphersuite strings (which will be referred to by the higher-level applications), but the ciphersuite string should not be part of the input to hash2base.

samscott89 commented 5 years ago

The arguments put forth here seem reasonable. Trying not to overthink it, but there are 4 scenarios, where you have the application author and library author are different people, each of whom can either use domain separation or not.

Domain separation in both -> redundant info, and logistical drawbacks
App does separation, not library (as suggested here) -> all good!
App does not do separation, neither does library (app author is not following RFC properly) -> potential cross-protocol attacks
App does not do separation, but library does (app author not following RFC, but library author follows our alternative, i.e. what we previously had) -> Perhaps cannot mount some attack against swapping out curves, but still potentially can do cross-protocol against app usage elsewhere?

The point being, the difference between 3 and 4 doesn't seem particularly meaningful, but the gap between 1 and 2 does. Unless there's a convincing reason why 4 is a bad situation.

The reason I think this is an important distinction, is the number of implementations of h2c should be much fewer than the numbers of implementations of applications.

chris-wood commented 5 years ago

The point being, the difference between 3 and 4 doesn't seem particularly meaningful, but the gap between 1 and 2 does. Unless there's a convincing reason why 4 is a bad situation.

IMO this is the crucial bit. I don’t think we can assume applications will add domain separation, even if we say they MUST do so. That seems to imply that we’re left considering the pros and cons of 1+4 (library does it) versus 2+3 (library doesn’t do it).

What’s more troubling is that while I agree with all of the downsides, from logistical nightmares to additional complexity and less re-use, the upsides are not well understood. That is, we have some concerns about possible cross-protocol attacks if h2c doesn’t perform domain separation. (Perhaps these are silly and not well founded — I’m not an expert here.) I think some more rigor would help make the decision easier. And it’s probably time to take this issue to the list for wider discussion. :-)

reyzin commented 5 years ago

@chris-wood you are right, we should consider potential upsides. Here are the upsides as I understand them. The main value of domain separation that I know is if the same secret key is used in multiple different schemes. In that case, domain separation may (but won't always!) help a security proof for the joint security of these schemes to go through, because at least random oracle queries of the two schemes will not overlap, so whatever arguments required fresh randomness / programmability are more likely to still go through.

I have not seen a convincing case for domain separation besides the above scenario.

But in general using the same SK for multiple purposes requires a thorough analysis, and simply putting in domain separation is insufficient. Moreover, domain separation only at the level of hash-to-curve is even less likely to be sufficient, because it will not necessarily ensure domain separation in the upstream apps -- esp. if the apps are using the same curve.

Basically, app-level domain separation is where you would get the upsides. The chances of the upsides coming from domain separation in hash-to-curve are low, because if people are using the same SK for different schemes, they are likely using the same curve, too.

kwantam commented 5 years ago

Another issue that came up in discussing #132: if hash-to-curve injects a ciphersuite string, and assuming that (say) curve25519 and edwards25519 have different ciphersuites, then these two hash functions won't be compatible. It would probably be nicer if hash-to-curve25519 and hash-to-edwards25519 gave points that are equivalent via the birational map specified in RFC7748.

chris-wood commented 5 years ago

@reyzin I concur :-) and I don't have a convincing case to deliver. I was simply advocating for weighing the options. It seems most folks (at least here on GitHub) are in favor of pushing the burden of domain separation to hash-to-curve callers, which is probably fine. Minimally, we should add some text describing why we find this tradeoff acceptable, if we go down that route.

kwantam commented 5 years ago

I was chatting with @henrycg about this last night, and he pointed out something that's perhaps a second-order concern, but certainly worth writing down: a case where h2c domain separation would potentially be useful is when a single protocol makes queries to two distinct hash-to-curve oracles, and relies on those queries being uncorrelated.

(Note that this is a case where we expect that a protocol's ciphersuite string is not sufficient to give domain separation, which answers @hoeteck's question above in a perhaps unexpected way.)

Our discussion was in the context of a contrived example of a protocol that hashes to both Curve25519 and P-256, but I think there's a much more natural one. Consider a protocol (vaguely reminiscent of the one by Muller) that uses points on both an elliptic curve and on its quadratic twist. Suppose that this hypothetical protocol relies on hashing to both the curve and its twist, and models these two hash functions as independent random oracles.

In this case, simply following the hash-to-curve document would not yield independent random oracles. Since the curve and its twist by definition reside in the same base field, the hash_to_base function for both curves will make exactly the same calls to H (say, SHA256), and will return exactly the same value. Now it's not at all obvious that the two oracles are uncorrelated!

To be clear, I don't know of any protocol like the above. But you could certainly imagine that a reader of the current hash-to-curve draft who is trying to implement such a protocol might incorrectly assume that hashes to two different curves can be treated as orthogonal random oracles!

There are a couple possible remedies here:

In the forthcoming section of the document that discusses domain separation, make clear that different hash functions cannot be treated as orthogonal random oracles, and upper-level protocols that hash to multiple curves must add domain separation between those curves.
Reconsider the ciphersuite question, again! @henrycg and I discussed a method for coming up with domain separation tags that might work. I'll specify it in a separate comment.

kwantam commented 5 years ago

Summarizing the practical concerns with adding a domain separation tag to hash-to-curve, the primary concerns are

How can someone derive a conforming ciphersuite tag for a curve that's not specified in the document, and
How can we give the very nice property that (for example) hash-to-curve25519 and hash-to-edwards25519 return equivalent points? More generally, we'd like it to be the case that isogenous curves have equivalent hash functions, in the sense that hashing any string s to isogenous curves E and E' returns points that are related by the isogeny map.

Handling (1), at a high level, requires us to specify some deterministic algorithm to compute a ciphersuite tag given the parameters of an elliptic curve. Handling (2) is slightly trickier, but it appears to be possible. Here's how:

It is a theorem (due to Tate) that any two curves over a field F having the same number of points are isogenous. (In fact, the implication goes both ways; the other direction is obvious by the definition of an isogeny.) Thus, the algorithm that is used to derive a ciphersuite tag will give equivalent outputs for isogenous curves in the case that the input to the algorithm is F, the field, and n, the order of the elliptic curve group. Here is a candidate such algorithm:

ciphersuite_id(F, n, W, H)

Inputs:
- F, a field (extension) parameterized by p and m, such that F = GF(p^m)
- n, the cardinality of the group of rational points on E over F
- W, the parameter from hash_to_base
- H, the hash function from hash_to_base

Output: csid, a 4-byte ciphersuite ID string

Steps:
1.    L1 = ceiling(log_base_256(p))    // i.e., # of bytes necessary to represent p
2.    L2 = ceiling(log_base_256(n))    // i.e., # of bytes necessary to represent n
3.     L = max(L1, L2)
4. L_str = I2OSP(L, 4)
5. W_str = I2OSP(W, 4)
6. m_str = I2OSP(m, 4)
7. p_str = I2OSP(p, L)
8. n_str = I2OSP(n, L)
9. c_pre = H(L_str || W_str || m_str || p_str || n_str)
10. return c_pre[0:4]                  // i.e., the first 4 bytes of c_pre

This 4-byte tag would be used as described in option 2 in the 1st message of this thread:

m' = "HASH-TO-CURVE" || csid || H(msg) || I2OSP(ctr, 1)

Note that for a given curve, csid is fixed for all time, so it can just be a hard-coded constant. There's no need to evaluate the ciphersuite_id function at runtime.

To be clear: I'm not sure yet whether I'm in favor of this or not. But at least this clears away some of the deployment / pragmatic concerns with using ciphersuites and lets us focus on whether or not this is desirable strictly from a security perspective.

kwantam commented 5 years ago

The above proposal, while perhaps an improvement, is by no means perfect. Here are some concerns that I can think of, off the top of my head:

It doesn't give domain separation between a supersingular curve and its quadratic twist, because such curves have the same number of points. This isn't an issue with the algorithm as much as with the specification that isogenous curves should have the same ciphersuite ID: a supersingular curve is isogenous to its quadratic twist, so it has the same ciphersuite ID.

So the question is, will the lack of domain separation between a supersingular curve and its twist have any practical effects on security? Right now the answer seems to be no, but Murphy is always lurking. Maybe adding some text warning about this case would be sufficient, if we took the decision to add separation as described above.
Should the ciphersuite string comprehend elements of the ciphersuite beyond the curve? For example, the ciphersuite will specify whether and how to clear cofactors, and whether to use encode_to_curve (not a random oracle) or hash_to_curve (indifferentiable from random oracle). These aren't comprehended in the above proposal; is that OK?

Unlike the case of hashing to both a curve and its twist (which may itself be implausible!), it seems really hard to imagine that one protocol would want two orthogonal hash functions to the same curve, where one hashes to the full curve and the other hashes to a prime-order subgroup. So maybe this isn't an issue. But it wouldn't be so hard to add a few more fields to c_pre if we wanted to comprehend these. Importantly, it seems possible to do this without breaking (say) curve25519/edwards25519 compatibility.
Related to the above: it might be a misfeature right now that hash_to_curve invokes hash_to_base with ctr = 0 and ctr = 1, rather than with ctr = 1 and ctr = 2. The reason to prefer the latter case is that encode_to_curve invokes hash_to_base with ctr = 0, so using 1 and 2 ensures that encode and hash make orthogonal random-oracle queries, totally for free.

Actually, probably a better alternative is to change the spec for encode_to_curve to use ctr = 2. This also gives orthogonality, but doesn't break compatibility with the current BLS signatures spec, which currently specifies indifferentiable hashing using ctr = 0 and ctr = 1 and already has some implementations.

kwantam commented 5 years ago

One more possibility related to domain separation that came up when chatting with @henrycg that I forgot to mention earlier today.

Let's assume that we're not doing any kind of ciphersuite-specific domain separation. In that case, it might still make sense to inject a hash-to-curve--specific (but not ciphersuite-specific) string into the hash_to_base function, with the aim of orthogonalizing the invocations of H in hash_to_base from other invocations of the same hash function within a given higher-level protocol. Specifically, consider a protocol that's using SHA-256 as a random oracle both in hash_to_base and in some other subroutine unrelated to hash-to-curve.

Like in the case a few comments above, protocol designers might reasonably expect that these random oracle invocations are orthogonal---and in all likelihood they are, considering the highly stylized H() invocations in hash_to_base. But adding an extra layer of protection is essentially free and might give a tiny bit of peace of mind. Concretely, I'm thinking something like this

m' = "HASH-TO-CURVE" || H(msg) || I2OSP(ctr, 1)

inside hash_to_base.

In other words, there's a third option that sits in between "no domain separation" and "per-ciphersuite domain separation," namely, adding a fixed string that's the same for all ciphersuites in hash_to_base. This ensures that invocations of H in hash-to-curve are orthogonal to other invocations of H.

I think this is another instance in which an application-level ciphersuite string isn't enough. And it seems like not enforcing separation between calls to H() inside hash_to_base and calls to H() elsewhere in an upper-level protocol may really be inviting badness.

armfazh commented 5 years ago

...uses points on both an elliptic curve and on its quadratic twist. Suppose that this hypothetical protocol relies on hashing to both the curve and its twist, and models these two hash functions as independent random oracles. In this case, simply following the hash-to-curve document would not yield independent random oracles. Since the curve and its twist by definition reside in the same base field, the hash_to_base function for both curves will make exactly the same calls to H (say, SHA256), and will return exactly the same value. Now it's not at all obvious that the two oracles are uncorrelated!

If I am not wrong, the curve and its twist have different curve equations, so how it is possible to get the same point since the elliptic curve coefficients play role in the mappings?

kwantam commented 5 years ago

If I am not wrong, the curve and its twist have different curve equations, so how it is possible to get the same point since the elliptic curve coefficients play role in the mappings?

You're right that the mapping will return a different point. I was pointing out that hash_to_base will return the same value for both curves when invoked on the same string, because hash_to_base only depends on F (which is definitely the same) and W and H (which are almost certainly the same).

If that happens, then the same u value is given as the input to two (presumably very similar) map_to_curve functions, one for the curve and one for the twist. As far as I can tell, there's no reason to believe that these two functions will give statistically uncorrelated results. Certainly they're not designed to, and as far as I know there's no analysis of such a situation in the literature.

kwantam commented 5 years ago

Folks, I'd like to submit a PR on domain separation by EoD tomorrow, so I'd appreciate any last-minute thoughts on the above.

Here's my concrete proposal:

Change hash_to_base in the way proposed in this comment, namely, add a fixed string to separate the use of H in hash_to_base from other uses of H in the invoking protocol.

The justification for this change is effectively the principle of least surprise: hash-to-curve doesn't look like SHA-256, so it's kind of surprising for it to require domain separation from SHA-256. I'd guess that even with a stern warning in the document, many users will get this wrong---especially if they're just invoking a hash-to-curve implementation from a library rather than reading the document and implementing it themselves.
Change to ctr = 2 for encode_to_curve, as discussed in point (3) in this comment. This is paranoia, but it's totally free, so we may as well do it.

This leaves us with the question of per-curve domain separation. Status quo appears to be that we will clearly state that this document does not guarantee domain separation between encodings to different curves. If a protocol invokes two different encodings and requires the results to be orthogonal, the protocol MUST inject its own domain separation tags.

I have to admit, I don't love this solution. Adding a 4-byte, deterministically-generated domain separation tag via something like ciphersuite_id is cheap, and has none of the practical downsides we've discussed upthread. And once again appealing to the principle of least surprise, it just seems like intuitively, oracles to different curves should be uncorrelated.

kwantam commented 5 years ago

This is mostly a note-to-self: to avoid adding another compression function invocation in hash_to_base, we can have at most 23 bytes beyond H(msg) in m'. Right now we add 3, namely, ctr, i, and j. "HASH-TO-CURVE" is 13 bytes, so we could in principle have up to a 7-byte csid without spilling into another compression invocation.

hoeteck commented 5 years ago

If I understand correctly, the latest issue with the curve and its twist goes away if we include the curve in the ciphersuite_id, and the high-level application calls hash_to_curve with the ciphersuite_id. Is that right?

Going back to the higher-level discussion, I don't think we should be trying to protect protocol designers from deviating from the recommendations of this draft; that incurs too much overhead and takes us down a deep rabbit hole.

More generally, we need to distinguish between implementation errors and design errors. I agree with the general principle of resilience to implementation errors (e.g. resilience to weak randomness and side channel attacks). On the other hand, I'm a lot less sympathetic to paying a price for resilience to design errors.

chris-wood commented 5 years ago

If I understand correctly, the latest issue with the curve and its twist goes away if we include the curve in the ciphersuite_id, and the high-level application calls hash_to_curve with the ciphersuite_id. Is that right?

@hoeteck more or less -- I assumed that the implementation of the specific cipher suite would just include ciphersuite_id, i.e., the caller would not pass anything beyond the message to hash.

hoeteck commented 5 years ago

How about the following as a compromise?

hash-to-curve does incorporate a ciphersuite_id with an extra 8 (or 16) bits, which are zeroes by default, but can be changed by higher-level applications.

In particular, we can use

option 2: fixed string + 4-byte ciphersuite in hash2base m' = H(ciphersuite || msg) || I2OSP(ctr, 1)

// note I removed "HASH-TO-CURVE".

but the first 2 bytes of ciphersuite are always 0 by default (we can think of 0x00 as encoding the string "HASH-TO-CURVE"). Moreover, the hash-to-curve spec should explicitly allow higher-level applications to modify those 2 bytes.

kwantam commented 5 years ago

If I understand correctly, the latest issue with the curve and its twist goes away if we include the curve in the ciphersuite_id, and the high-level application calls hash_to_curve with the ciphersuite_id. Is that right?

Maybe---it depends what you mean by the ciphersuite id. The issue is that a single, protocol-level ciphersuite ID isn't sufficient in this case---the protocol has to use a separate ID tag for each curve it hashes to, in order to ensure that those hashes are orthogonal.

Concretely, imagine that a protocol implements two functions, hash_to_curve and hash_to_twist, and wants to be sure that they are orthogonal. Then the following is OK:

Pcurve = hash_to_curve(ciphersuite_id || "CURVE" || msg)
Ptwist = hash_to_twist(ciphersuite_id || "TWIST" || msg)

but this is not OK:

Pcurve = hash_to_curve(ciphersuite_id || msg)
Ptwist = hash_to_twist(ciphersuite_id || msg)

To me, that's a reasonably subtle distinction, and it seems like "hash_to_curve" and "hash_to_twist" should be doing that work, not their callers.

I completely agree that it's impossible to prevent people from misunderstanding or ignoring the recommendations. On the other hand, to me it makes sense to try to anticipate insidious misunderstandings and to make those misunderstandings implementation errors that can be caught in one place (with test vectors), rather than subtle bugs at individual call sites that may very well go unnoticed.

I know I sound like a broken record, but the case of library users really worries me. Library users should not need to understand detailed security recommendations from the hash-to-curve document in order to safely invoke a hash-to-curve function that someone else wrote. Or, maybe more accurately: library users just will not read this document. To whatever extent is reasonable, complying implementations should protect them anyway.

kwantam commented 5 years ago

hash-to-curve does incorporate a ciphersuite_id with an extra 8 (or 16) bits, which are zeroes by default, but can be changed by higher-level applications.

I think I'm not quite understanding your proposal:

Doesn't the current interface already do this, since the caller is already welcome to prefix msg with arbitrary tags? In other words: is this any different? How?
Why remove "HASH-TO-CURVE"?
Are you proposing to change the API to something like hash_to_curve(msg, ciphersuite) ?
Does this really improve the situation compared to the no-ciphersuite option? By default there's still no domain separation for SHA-256 inside hash_to_base, and random oracles to different curves are not orthogonal.

kwantam commented 5 years ago

On the other hand, I'm a lot less sympathetic to paying a price for resilience to design errors.

I realize that you don't mean a price in the literal sense, but to be clear: there is no computational overhead when adding a per-curve ciphersuite. hash_to_base invokes exactly the same number of rounds of SHA2 in either case.

hoeteck commented 5 years ago

There are a couple issues being discussed, but here, I'm focusing on the issue of option 1 vs option 2 at the beginning of this thread, in the context of BLS signatures. In BLS signatures, we want to support additional ciphersuite information beyond what's in hash-to-curve, let's suppose we only need a single byte (concretely, this byte would indicate different mechanisms for preventing rogue-key attacks, e.g. 0x01 for proof of possession and 0x02 for message augmentation).

To answer,

... the caller is already welcome to prefix msg with arbitrary tags? In other words: is this any different? How?

Let's suppose we go with option 2 with pre-hashed for free, namely:

m' = "HASH-TO-CURVE" || ciphersuite || H(msg) || I2OSP(ctr, 1)

Let's supposed we want to sign the message "Hello" using BLS signatures with option 0x01 over BLS12-381 curve. Looking up the current table, I'd use "H2C-0008". Now, what would m' be?

option H1. set msg in m' := 0x00 || "Hello" (What Riad referred to as prefix msg with arbitrary tag).

m' = "HASH-TO-CURVE" || H2C-0008 || H(0x01 || "Hello") || I2OSP(ctr, 1)
option H1b. The same thing, but with pre-hashed for free, namely:

m' = "HASH-TO-CURVE" || H2C-0008 || H(0x01 || H("Hello")) || I2OSP(ctr, 1)
option H2. allow higher-level applications to modify first bytes of ciphersuite (as I suggested above). Then, in the pre-hashed for free setting, we get:

m' = "HASH-TO-CURVE" || H2C-0108 || H("Hello") || I2OSP(ctr, 1)

I see two advantages in option H2:

(concrete efficiency) comparing option H1b and H2, we save one hash.
("aesthetics") all the ciphersuite information go into the same place, namely H2C-0108. This makes defining ciphersuite for higher-level applications much cleaner, namely

higher-level specific options || hash-to-curve options

More generally, we can have H2C-xxyyzz with 3 bytes, with xx defaulting to 00 and reserved for high-level applications.

Why remove "HASH-TO-CURVE"?

This is mostly aesthetic, but if we include a string "HASH-TO-CURVE" in m', then we should also include "BLS-SIGN" in m', and I don't know a clean way to do in option 2. But let's put this aside for now.

Hope that clarifies things somewhat! :)

kwantam commented 5 years ago

option H2. allow higher-level applications to modify first bytes of ciphersuite (as I suggested above). Then, in the pre-hashed for free setting, we get: m' = "HASH-TO-CURVE" || H2C-0108 || H("Hello") || I2OSP(ctr, 1)

I agree this is nice from a performance perspective, but in my mind it doesn't provide meaningful domain separation. This is related to pairingwg/bls_standard#17. The issue is that one byte is not sufficient to ensure that different protocols make distinct calls to the random oracle. Concretely, if protocols A and B both use a one-byte ciphersuite tag, there's a really good chance that both of them will compute exactly the same value for m'---especially if they just enumerate their ciphersuites starting from 0 the way BLS is currently doing.

This is exactly the situation we're trying to avoid with domain separation: the protocols need to somehow include a globally unique (we hope) string inside m'. This is why I'm strongly in favor of "HASH-TO-CURVE", "BLS-SIGN", "RFC7748", or whatever.

I'm also not in favor of weakening the abstraction / complicating the interface between hash-to-curve and upper-level protocols. In my mind, the signature of the hash-to-curve functions should be

{0, 1}^* -> E

And they should behave in a way that, to the greatest extent possible, aligns with intuition. For the purposes of this thread, from my perspective "intuitive" means that different hash functions are fully distinct from all other random oracles in a protocol. This is the best possible guarantee that hash-to-curve can give. That doesn't mean that higher-level protocols don't need to do domain separation among themselves! but it does mean that it's safe to treat conforming hash-to-curve implementations as a black box. In other words, of course hash-to-curve functions can be misused, but the simplest and most obvious way to use them is probably the right way, modulo responsibilities that only the upper-level protocol is in a position to discharge.

As far as performance goes, in my mind the calling protocol should only be passing tagged messages to any random oracle. This means that there really is no extra cost for computing

H("MY-PROTOCOL" || 0x01 || original_message)

because the upper-level protocol should never call H(original_message), to a first approximation.

EDIT: this is sort of separate from the above, but I think it's pretty clear from the discussion upthread that fixing a table of ciphersuite IDs is a non-starter. So let's assume that we'd compute the ciphersuite ID using a deterministic algorithm. I've edited the first post in the thread to that effect.

kwantam commented 5 years ago

To add another perspective:

I spoke with Dan (Boneh) about this today, and his take was that this is very application dependent, so it might be best to leave it to the applications to decide whether they want per-curve domain separation rather than arbitrarily decree that there shall be separation just between isogeny classes. So that's another vote against per-curve separation.

Dan is in favor of adding some fixed string to the H() calls inside hash_to_base, roughly as discussed here. But he suggested that it is probably better to use HKDF than to "roll our own" PRG. Let's leave that to an orthogonal discussion---I've created #137.

Since it seems like there's little enthusiasm to go all-in on per-curve domain separation, I'm fine adding text that explicitly delegates this task to the upper-level protocols, at least for now. We can revisit this decision in the future if necessary, but I'd rather get some text about domain separation before the deadline, since that's probably the most effective way to solicit feedback from the broader community.

kwantam commented 5 years ago

I created #139 capturing more or less what's here, tabling the question of per-curve domain separation for now since there seems to be little enthusiasm for it. Comments appreciated.

kwantam commented 4 years ago

We reached a consensus on domain separation, so I'm closing this issue.

cfrg / draft-irtf-cfrg-hash-to-curve

ciphersuites and domain separation #124

option 1: no ciphersuite

option 2: fixed string + 4-byte ciphersuite in hash2base