cfrg / draft-irtf-cfrg-hash-to-curve

Hashing to Elliptic Curves
Other
79 stars 27 forks source link

hash-to-base issues #202

Closed kwantam closed 4 years ago

kwantam commented 4 years ago

EDIT: Please see this comment for a summary of the proposed changes.


Sorry to reopen this can of worms, but I think there are a couple reasons to worry about the current version of hash-to-base.

  1. @BjoernMHaase points out that, for embedded systems, hashing can be much more expensive than field or curve operations. The reason is, hashing is done on the main (sometimes 8-bit!) processor, whereas field and curve operations use a specialized co-processor. The result is that, in his CPace implementation, he doesn't want to use our version of hash-to-base.

    It seems reasonable to expect this to be an issue for several classes of hash-to-curve applications. I think we should be responsive to it, and come up with a simplified version of hash-to-base that's safe in this context.

  2. @reyzin points out that in the VRF draft they were very careful to make sure that all invocations of SHA-2 are domain separated, but our use of H in hash-to-base is incompatible with this. In fact, we sat down and talked through it yesterday, and there does not appear to be a safe and cheap way of domain separating uses of H outside of hash-to-base from its uses inside. The only way to do it appears to be to select a random, 32-byte string (e.g., by hashing a domain separation tag) and prepending that value to every use of H outside of hash-to-base.

    This is really unsatisfying: first, it adds an extra invocation of the compression function, which is bad in resource-constrained contexts (per (1), above). Second, it means that domain separation requirements "escape" the hash-to-curve draft, in the sense that implementors need to understand how hash-to-curve uses H in order to safely reuse H in upper-layer protocols---an abstraction violation.

  3. We've heard from several people that they want to use SHA3, and our version of hash-to-base is overdesigned for this purpose. I don't care so much about overdesign in itself---it would be great if we could have a one size fits all version of hash-to-base---but given issues 1 and 2, we have an opportunity to revisit.

I'm working on a couple proposals to fix this, one based on KMAC (or perhaps SHAKE) and one that's designed to be safe (including in the sense of (2) above) and efficient for Merkle-Damgaard functions (but should also be safe and efficient for SHA-3). For now I'm not proposing a specific course of action, just giving early warning that this is an issue we should think carefully about.

BjoernMHaase commented 4 years ago

Hi to all,

I see a larger class of PAKE protocols to be the natural "users" of the results of your work. As pointed out also in RFC8125, one aspect to consider is also side-channel resistance beyond constant-time execution, specifically, if smart-card based implementations are to be recommended in hostile environment.

Last week during the breaks of the brainpool meeting, I had some discussion with people with more expertise on secure elements than I am having, specifically people from NXP and Infineon and BSI people.

What I took with me from the discussions is,

1) the major concern with the hash function use might be, that smart-card based protected/masked implementations of hash functions are not commonly available in hardware and their use is to be avoided in protocols that require side-channel protections.

2) Their recommendation was to give preference to constructions based on SPN-based block ciphers, where possible. They mentioned, e.g., the PRF operation required for expanding secrets.

3) SHA3 might be easier to protect, but today also this algorithm, which is better prepared for SPA protections than Merkle-Damgard constructions, but also there platforms with a protected implementation are not commonly available today.

4) Expanding the inputs to twice the base field size might be a good idea. Some secure-element implementations seem to use randomization the base field representations (by adding multiples of the prime). This possibly could provide one component for protecting the Map2Point base field arithmetics against SPA.

I have asked several smart card expert people to post an assessment or recommendation on the CFRG list, but I fear that their company policies won't allow for such contribution.

A second aspect that I noticed in the current hash2curve draft is the mandatory co-factor blinding already on the hash2curve level.

While Elligator2 returns the point in affine coordinates. (See e.g. the post from Mike Hamburg from 2017 https://moderncrypto.org/mail-archive/curves/2017/000939.html)

After co-factor blinding, the point will be represented in some projective coordinate system. The cost of this corresponds to an additional base field inversion. Citing again Mike Hamburg from his 2017 post,

"The projective ladder requires one more register (Z0) and is slower if you have a dedicated squaring algorithm. It’s faster [than a variant with inversion and an affine ladder] if you’re using multiply as square."

Yours,

Björn

Am 22.01.2020 um 17:05 schrieb Riad S. Wahby:

Sorry to reopen this can of worms, but I think there are a couple reasons to worry about the current version of hash-to-base.

1.

@BjoernMHaase <https://github.com/BjoernMHaase> points out that,
for embedded systems, hashing can be much more expensive than
field or curve operations. The reason is, hashing is done on the
main (sometimes 8-bit!) processor, whereas field and curve
operations use a specialized co-processor. The result is that, in
his CPace implementation, he doesn't want to use our version of
hash-to-base.

It seems reasonable to expect this to be an issue for several
classes of hash-to-curve applications. I think we should be
responsive to it, and come up with a simplified version of
hash-to-base that's safe in this context.

2.

@reyzin <https://github.com/reyzin> points out that in the VRF
draft they were very careful to make sure that all invocations of
SHA-2 are domain separated, but our use of H in hash-to-base is
incompatible with this. In fact, we sat down and talked through it
yesterday, and there does not appear to be a safe and cheap way of
domain separating uses of H outside of hash-to-base from its uses
inside. The only way to do it appears to be to select a random,
32-byte string (e.g., by hashing a domain separation tag) and
prepending that value to every use of H outside of hash-to-base.

This is really unsatisfying: first, it adds an extra invocation of
the compression function, which is bad in embedded contexts.
Second, it means that domain separation requirements "escape" the
hash-to-curve draft, in the sense that implementors need to
understand how hash-to-curve uses H in order to safely reuse H in
upper-layer protocols---an abstraction violation.

3.

We've heard from several people that they want to use SHA3, and
our version of hash-to-base is overdesigned for this purpose. I
don't care so much about overdesign in itself---it would be great
if we could have a one size fits all version of hash-to-base---but
given issues 1 and 2, we have an opportunity to revisit.

I'm working on a couple proposals to fix this, one based on KMAC (or perhaps SHAKE) and one that's designed to be safe for Merkle-Damgaard functions (but should also be safe and efficient for SHA-3). For now I'm not proposing a specific course of action, just giving early warning that this is an issue we should think carefully about.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cfrg/draft-irtf-cfrg-hash-to-curve/issues/202?email_source=notifications&email_token=ADMGYAHTPLA7HYE2KUA6UI3Q7BVE3A5CNFSM4KKIKOM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IH73ZRA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMGYAB4IJIAXE2PNLERVADQ7BVE3ANCNFSM4KKIKOMQ.

kwantam commented 4 years ago

Thanks for the comments, Björn! A few quick responses below.

I will keep thinking about this.

kwantam commented 4 years ago

Here's a skeletal proposal for a simplified approach to hash-to-field.

At a high level, this proposal splits hash-to-field into two steps:

  1. Generate a pseudorandom byte string based on the message and domain separation tag.

  2. Interpret this byte string as one or more elements of F = GF(p^m).

Note that this is a departure from the current approach: in the above, hash-to-field can return multiple field elements in one call. This means that both hash-to-curve and encode-to-curve will call hash-to-field exactly once, but will request either one (encode-) or two (hash-) field elements.

There are three reasons for this change: first, it minimizes the number of hash function invocations, which is one of the explicit goals of this redesign. Second, it simplifies drop-in use of any hash function in the SHA-2 or SHA-3 family, meaning that we do not have to specify a different hash-to-field function to use, say, SHAKE128. Third, it makes drop-in use of block cipher--based expansion easy (though, per my prior comment, I do not necessarily advocate specifying here).

Here's the new proposed hash-to-field function:

hash_to_field(msg, count)

Parameters:
- DST, a domain separation tag.
- F, a finite field of characteristic p and order q = p^m.
- p, the characteristic of F.
- m, the extension degree of F, m >= 1.
- L = ceil((ceil(log2(p)) + k) / 8), where k is the security parameter.
- expand_message, a function that takes a message, DST, and number of
  bytes and outputs that number of pseudorandom bytes.

Input:
- msg is the message to hash.
- count is the number of elements of F to output.

Outputs:
- (u_0, ..., u_(count - 1)), count field elements.

Steps:
1. prb_length = count * m * L
2. pseudo_random_bytes = expand_message(msg, DST, count * m * L)
3. for i in (0, ..., count - 1):
4.   for j in (0, ..., m - 1):
5.     elm_offset = L * (j + i * m)
6.     tv = pseudo_random_bytes[elm_offset : elm_offset + L]
7.     e_i = OS2IP(tv) mod p
8.   u_i = (e_0, ..., e_(m - 1))
9. return (u_0, ..., u_(count - 1))

In subsequent comments I'll discuss options for the expand_message function.

kwantam commented 4 years ago

Now we just have to specify an expand_message function.

We can easily build one based on SHAKE128:

expand_message_shake128(msg, DST, len_in_bytes)

Input:
- msg, an octet string
- DST, an octet string
- len_in_bytes, length of requested output in octets

Output:
- pseudo_random_bytes, an octet string

Steps:
1. msg_prime = DST || I2OSP(len_in_bytes, 2) || msg
2. return SHAKE128(msg_prime, 8 * len_in_bytes)

(Note that the length argument to SHAKE128, per FIPS 202, is in bits.)

SHAKE256 is analogous, of course.

kwantam commented 4 years ago

How about SHA-2? Here we have to be careful, since SHA-2 is a Merkle-Damgaard construction and is thus not itself indifferentiable from a random oracle, even if we assume that the underlying compression function is a random oracle. (This relates to well-known length extension attacks, multi-collision attacks, etc., on M-D constructions.)

Note, however, that the value expand_message returns to hash_to_field isn't exposed to hash_to_field's caller. Instead, it's cut into (log(p)+k)-bit chunks, which are reduced mod p. We can use this fact to our advantage!

In particular, while I haven't yet written down a full proof, it looks very likely that we can build on existing analyses of chop-MD (1, 2, 3, i.e., constructions like SHA512/256) to show that reducing a (log(p)+k)-bit integer mod p suffices for indifferentiability. Intuitively, the reason is that the same k extra bits we're using to get a near-uniform element of GF(p) also suffice to prevent length extensions, etc.

~Here's the proposed function:~

EDIT: removing this version, because it's not quite strong enough. I'll post a new one below.

kwantam commented 4 years ago

EDIT: I vote not to specify this function in the hash-to-curve draft. Please regard this comment as spitballing only.


If we really wanted to specify something based on a ctr-mode cipher, we could do something like this:

expand_message_ctr(msg, DST, len_in_bytes)

Parameters:
- E, a block cipher encryption function taking kE-bit keys and bE-bit blocks.
- bE, the block size of E.
- kE, the key length of E.
- H, a hash function that outputs at least kE + bE + k bits, for security parameter k.

Input and output: same as above

Steps:
1.  ell = (len_in_bytes * 8) / bE
2.  ABORT if ell > 255
3.  msg_prime = H(DST || I2OSP(ell, 1) || msg)
4.  eKey = first kE bits of msg_prime
5.  eCtr = next kB bits of msg_prime
6.  ctr = OS2IP(eCtr)
7.  for i in (0, ..., ell - 1):
8.    ctr = ctr + 1
9.    b_i = E(eKey, I2OSP(ctr, kB / 8))
10. pseudo_random_bytes = b_0 || ... || b_(ell - 1)
11. return pseudo_random_bytes[0 : len_in_bytes]

Of course, it would be totally reasonable to replace the block cipher with a good stream cipher...

Since we're throwing away k bits of H's output, it's safe to use a Merkle-Damgaard hash function. When using SHA-3, we can relax the requirement on H's output size to just kE + bE.

For collision resistance, we require that kE + bE > 2 * k. Note that by this definition AES-256 is only appropriate for k = 192, not for k = 256 (because the AES block size is small). There are ways to handle this (e.g., XOR the output of two independently-seeded AES-CTR PRGs).

kwantam commented 4 years ago

OK, here's an improved version of expand_message_md along with a very brief sketch of the security argument.

For this construction, we require a hash function H whose output size in bits is at least 2 * k, for k the security parameter.

expand_message_md(msg, DST, len_in_bytes)

Parameters:
- H, a Merkle-Damgaard or Sponge-based hash function.
- obs, the output block size of H in bytes.
- k_in_bytes, ceil(k / 8) for k the security parameter, e.g,
  for k = 128, k_in_bytes = 16.

Input:
- msg, an octet string.
- DST, an octet string.
- len_in_bytes, the length of the requested output in octets.

Output:
- pseudo_random_bytes, an octet string.

Steps:
1. ell = ceil((len_in_bytes + k_in_bytes) / obs)
2. ABORT if ell > 255
3. b_0 = H(DST || I2OSP(0, 1) || I2OSP(ell, 1) || msg)
4. for i in (1, ..., ell - 1):
5.   b_i = H(DST || I2OSP(i, 1) || b_(i - 1))
6. b_0_chopped = first (obs - k_in_bytes) bytes of b_0
7. pseudo_random_bytes = b_0_chopped || b_1 || ... || b_(ell - 1)
8. return pseudo_random_bytes[0 : len_in_bytes]

In principle one can skip chopping b_0 for functions in the SHA-3 family (or, more generally, for H with a reasonable indifferentiability proof). In practice, it's not clear to me whether the extra complexity is worthwhile. Thoughts?


Security argument (sketch)

We argue security by composing existing security arguments for chop-MD and NMAC.

In particular, chop-MD that cuts off k bits of a hash whose output is at least 2 * k bits is indifferentiable from a random oracle with distinguishing advantage roughly Q / 2^-k, for an adversary making Q queries, paraphrasing DFL12. Thus, the output b_0_chopped is indifferentiable from a RO with the same advantage.

We can view outputs b_1 through b_(ell - 1) in two ways. Either we can regard them as a variant of the NMAC / HMAC^f instantiation due to Coron et al. [CDMP05, Theorem 3.5], or we can simply notice that the length of inputs to H for all of these outputs is fixed and all calls to H are prefix-free (since they start DST || 0, DST || 1, ...). In either case, since the output size of H is at least 2 k, we have from CDMP05 that the distinguishing advantage is roughly Q^2 / 2^(2 k).

Next, we can leverage the composability of indifferentiability to show that the concatenation of independent random oracle outputs is indifferentiable from a random oracle. The simulator is the trivial one.

Finally, we need one more indifferentiability argument, namely, that chopping the "big" RO output into (log p + k)-bit chunks and reducing each one mod p is indifferentiable. Once again the simulator here is quite simple: for each field element, there is an equivalence class of roughly 2^k bit strings; the simulator simply picks one of these. This is indifferentiable for essentially the same reason that reducing a (log p + k)-bit value mod p gives a field element with statistical difference at most 2^-k from uniform.

kwantam commented 4 years ago

OK, having worked through the above and thought carefully about this, I propose the following:

  1. Replace hash_to_base in the current spec with hash_to_field as defined above.
  2. Specify expand_message_shake128 / 256 and expand_message_md, but not expand_message_ctr. The reason is that we don't want to encourage people to use another primitive, and we don't want to take on the burden of proving security for the _ctr construction.
  3. Update the suite naming scheme to include a field that specifies the expand_message variant and underlying hash function.

~There's one dangling question for the _md construction: do we chop for SHA-3, or only for SHA-2? This is a question of slightly simpler spec vs. slightly more efficient implementation. My inclination is towards the simpler spec, but reasonable people can obviously disagree...~ EDIT: My vote is simplicity of specification. Always chop b0, even for SHA-3.

cc @chris-wood @armfazh @grittygrease @samscott89

BjoernMHaase commented 4 years ago

Hello Riad,

thank you for your effort and consideration. In think that the computational complexity and RAM requirement is reduced quite a bit with your suggestion. Moreover, by using the SHAKE primitives there will be (IMO) a good path for providing future SPA secured implementations. Maybe not on all platforms available today, but maybe in the future (the SHA3 permutation is much more easily masked and protected than Merkle-Damgaard, as much as I understand).

Also one advantage side-channel wise of your construction is that the message input (of possibly low entropy in the PAKE context) is fed only once into the construction. For the presumably important P-256 case, your expand operation would also boil down to only two SHA-256 hash function invocations (if I counted correctly), if only one single field element is needed, e.g. for encode_to_curve().

Note also that with this approach attempts for side-channel mitigations (as I have suggested, e.g. in the CPace draft with the 0-Padding after the low-entropy secret input) could be implemented by just modifying the format of the message field.

What I will be doing is, I'll try to motivate some other people to review your suggestion.

As final remark, I'd like to come up with one question. In case that the hash function output block length alone is sufficiently large for generating the required len_in_bytes amount of PRF bytes: Would you consider it acceptable to just drop the "chopping" operation for b0 and use all of the bytes of the b0 output? One might be sparing one additional invocation of the hash. Heuristically, I think that one then might want to add the len_in_bytes field to the initial hashing operation yielding b0, e.g. before the message. This way "length extension type" attacks are not manageble and using the message with different lengths results always in different expanded messages, even if the length is changed only by a small amount that does not modify the number of required blocks.

(My question relates to the case of X25519, which is often used in conjunction with Ed25519 and SHA512, even if the security parameters are not matching. An encode2curve operation for X25519 could possibly share all of field arithmetics and hash function implementation with Ed25519. In this case only one single hash invocation of SHA512 would be required.)

Yours,

Björn

BjoernMHaase commented 4 years ago

I did forget to add the note regarding the "should one consider to drop the chop operation if used in conjunction with SHA-3?" topic.

I'd advocate that people really focusing on efficiency would likely be using a SHAKE128 or SHAKE256 construction instead? So maybe there is no real justification for making the specification for sponges different from Merkle-Damgaard constructions.

kwantam commented 4 years ago

In case that the hash function output block length alone is sufficiently large for generating the required len_in_bytes amount of PRF bytes: Would you consider it acceptable to just drop the "chopping" operation for b0 and use all of the bytes of the b0 output? One might be sparing one additional invocation of the hash. Heuristically, I think that one then might want to add the len_in_bytes field to the initial hashing operation yielding b0, e.g. before the message. This way "length extension type" attacks are not manageble and using the message with different lengths results always in different expanded messages, even if the length is changed only by a small amount that does not modify the number of required blocks.

Unfortunately, it is not safe to drop the chop operation for b0.

It might be safe to do this if we prepended msg with its length, e.g.:

b_0 = H(DST || I2OSP(0, 1) || I2OSP(ell, 1) || I2OSP(len(msg), 8) || msg)

(But note I am only saying it might---I have not thought about it carefully enough to be sure.)

There are two reasons I didn't want to go this way: first, it's never obvious how long a field to make this. Surely 8 bytes is enough! But probably 4 isn't, since we could imagine hashing a couple gigabytes in some weird corner case. So now we need a pretty long field.

Second, the issue with needing to prepend len(msg) to msg is that it means we have to know the length of the message before we start hashing it! This is probably true in most usages, but it's not obviously true all the time.

(My question relates to the case of X25519, which is often used in conjunction with Ed25519 and SHA512, even if the security parameters are not matching. An encode2curve operation for X25519 could possibly share all of field arithmetics and hash function implementation with Ed25519. In this case only one single hash invocation of SHA512 would be required.)

Right! But for X25519 or Ed25519, we have log2(p) = 255, k = 128, so this scheme would use 384 bits to generate one element of GF(p), and 512 - 128 = 384, so in fact even without modification you can use a single SHA512 invocation. And in the hash-to-curve (rather than encode-to-curve) case, you would need 2 SHA-512 invocations even if you didn't chop b_0.

(Note that this isn't strictly true in the case of X25519, since we can get very close to uniform field elements by reducing 256 bits mod p since p is so close to 2^255. But I don't think we should add special cases for p of special forms, because it will complicate things considerably.)

BjoernMHaase commented 4 years ago

OK, I understand. It's the same topic as the notorious CBC-MAC issue. In fact, the approach that you have sketched would not at all generate overhead in comparison with  "dropping the chop".

If your suggestion becomes consensus in the Hash2Curve team, I'd be modifying the CPace and AuCPace drafts to use the hash2curve default constructions. (e.g. maybe with the tweak of using SHA512 together with Curve25519).

Am 23.01.2020 um 23:04 schrieb Riad S. Wahby:

In case that the hash function output block length alone is
sufficiently large for generating the required len_in_bytes amount
of PRF bytes: Would you consider it acceptable to just drop the
"chopping" operation for b0 and use all of the bytes of the b0
output? One might be sparing one additional invocation of the
hash. Heuristically, I think that one then might want to add the
len_in_bytes field to the initial hashing operation yielding b0,
e.g. before the message. This way "length extension type" attacks
are not manageble and using the message with different lengths
results always in different expanded messages, even if the length
is changed only by a small amount that does not modify the number
of required blocks.

Unfortunately, it is not safe to drop the chop operation for b0.

It might be safe to do this if we prepended |msg| with its length, e.g.:

|b_0 = H(DST || I2OSP(0, 1) || I2OSP(ell, 1) || I2OSP(len(msg), 8) || msg) |

There are two reasons I didn't want to go this way: first, it's never obvious how long a field to make this. Surely 8 bytes is enough! But probably 4 isn't, since we could imagine hashing a couple gigabytes in some weird corner case. So now we need a pretty long field.

Second, the issue with needing to prepend |len(msg)| to |msg| is that it means we have to know the length of the message before we start hashing it! This is probably true in most usages, but it's not obviously true all the time.

(My question relates to the case of X25519, which is often used in
conjunction with Ed25519 and SHA512, even if the security
parameters are not matching. An encode2curve operation for X25519
could possibly share all of field arithmetics and hash function
implementation with Ed25519. In this case only one single hash
invocation of SHA512 would be required.)

Right! But for X25519 or Ed25519, we have log2(p) = 255, k = 128, so this scheme would use 384 bits to generate one element of GF(p), and 512 - 128 = 384, so in fact even without modification you can use a single SHA512 invocation. And in the hash-to-curve (rather than encode-to-curve) case, you would need 2 SHA-512 invocations in any case.

(Note that this isn't strictly true in the case of X25519, since we can get very close to uniform field elements by reducing 256 bits mod p since p is so close to 2^255. But I don't think we should add special cases for p of special forms, because it will complicate things considerably.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cfrg/draft-irtf-cfrg-hash-to-curve/issues/202?email_source=notifications&email_token=ADMGYAB6MNAG4II6UCQSZR3Q7IH6TA5CNFSM4KKIKOM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJZAUFA#issuecomment-577899028, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADMGYAEOXOSZ7ABY3GDCZG3Q7IH6TANCNFSM4KKIKOMQ.

kwantam commented 4 years ago

If your suggestion becomes consensus in the Hash2Curve team, I'd be modifying the CPace and AuCPace drafts to use the hash2curve default constructions. (e.g. maybe with the tweak of using SHA512 together with Curve25519).

Cool!

By the way, the VRF draft also wants to use Curve25519 with SHA-512 (for the same reason as you, I think---because Ed25519 uses this combination), so I plan to propose defining a suite for this use-case.

burdges commented 4 years ago

Did you guys ever assess https://eprint.iacr.org/2019/1294.pdf by Koshelev Dmitrii?

kwantam commented 4 years ago

Did you guys ever assess https://eprint.iacr.org/2019/1294.pdf by Koshelev Dmitrii?

Unless I'm misunderstanding, the above pertains to a new mapping, not to hash-to-base. So probably it would be better to discuss in its own issue.

First impression: I doubt we'd include this map given that we've just removed other "special-purpose" maps (see #198), and this is another map that's restricted to a family of curves whose practical interest seems limited. But if this doesn't seem right---that is, if there are interesting reasons to have a map specifically for the j=1728 case---then please open a new issue about it and I'd be glad to chat more.

peteroupc commented 4 years ago

Has the following discussion been considered? I believe some of the discussion there may be relevant to the hash_to_base function. sipa/bips#195

kwantam commented 4 years ago

Has the following discussion been considered? I believe some of the discussion there may be relevant to the hash_to_base function. sipa/bips#195

Thanks for the pointer.

In the context of hash-to-curve, the answer is that there's no public or private inputs---that depends on the invoking protocol. Because of this, it looks to me like there's no way to handle this generically in hash-to-curve: this is a question for the higher-level protocols that invoke h2c.

That seem to imply, however, that it might be worthwhile to add some words in the Security Considerations section discussing this, so that invoking protocols are suitably cautious.


As a second way of answering: h2c currently tries to help readers avoid timing leaks. Power side channels are not really in scope---and, related to what I've said above, it's not clear to me that they could ever be in scope in a generic way that defends all invoking protocols against attack. This also seems to imply the need for a pointer in Security Considerations, I think.

BjoernMHaase commented 4 years ago

 

Hello to all,

 

actually it was exactly the analysis of Niels Samwel that drove me to the suggestion for proposing the hash-to-base variant in https://tools.ietf.org/html/draft-haase-cpace-00.

 

Yours,

 

Björn

Gesendet: Montag, 03. Februar 2020 um 23:14 Uhr Von: "Riad S. Wahby" notifications@github.com An: cfrg/draft-irtf-cfrg-hash-to-curve draft-irtf-cfrg-hash-to-curve@noreply.github.com Cc: BjoernMHaase bjoern.m.haase@web.de, Mention mention@noreply.github.com Betreff: Re: [cfrg/draft-irtf-cfrg-hash-to-curve] hash-to-base issues (#202)

Has the following discussion been considered? I believe some of the discussion there may be relevant to the hash_to_base function. sipa/bips#195

Thanks for the pointer.

In the context of hash-to-curve, the answer is that there's no public or private inputs---that depends on the invoking protocol. Because of this, it looks to me like there's no way to handle this generically in hash-to-curve: this is a question for the higher-level protocols.

It may, however, be worthwhile to add some words in the Security Considerations section discussing this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

secunets commented 4 years ago

Great discussion. Following closely to learn as much as possible and contribute at the right time.

grittygrease commented 4 years ago

@kwantam In the spirit of code re-use, would it make sense to allow implementers who have a bigger toolbox than just cryptographic hashes and/xof functions to use these tools? For example, HKDF is available in many languages and libraries already. Perhaps we could allow HKDF to be used with the expand_message_xof function? It would be nice to be able to avoid implementing all of expand_message_xmd if HKDF is available.

jedisct1 commented 4 years ago

Do we really want even more options?

HKDF is indeed widely available (or, if not, trivial to implement) and could be the only option for all suites.

kwantam commented 4 years ago

Perhaps we could allow HKDF to be used with the expand_message_xof function? It would be nice to be able to avoid implementing all of expand_message_xmd if HKDF is available.

It would be pretty easy to define expand_message_hkdf() if we really want people to be able to use it. But:

HKDF is indeed widely available (or, if not, trivial to implement) and could be the only option for all suites.

Probably not, because of the issues with HKDF described in the first post.

I'm most worried about the fact that it makes strict domain separation impossible for the underlying hash. Moreover, it would make our discussion of domain separation in the document messier, because we'd have to explain that a lot of what we say doesn't apply to expand_message_hkdf. Or we'll have to explicitly assume (not at all unreasonably) that H-inside-HKDF and H-outside-HKDF are already sufficiently separated, and the fact that further separation is impractical is OK.

In any case, I suppose if we wanted to define it, it'd look something like this:

def expand_message_hkdf(msg, DST, len_in_bytes):
   prk = HKDF_Extract(DST, msg || I2OSP(0, 1))
   pseudo_random_bytes = HKDF_Expand(prk, "expand_message_hkdf", len_in_bytes)
   return pseudo_random_bytes

(The extra zero byte appended to msg makes the indifferentiability proof go through.)