expand_message_xmd injectivity

chris-wood commented 4 years ago

As pointed out by David Benjamin, Step 6 of expand_message_xmd is not injective! Here are two inputs that will produce the same output (b_0 = H(Z_pad + [0, 16, 0, 4, 0, 16, 0, 0])):

expand_message_xmd(msg=[], DST=[0, 16, 0, 0], len_in_bytes=16) expand_message_xmd(msg=[0, 16, 0, 4], DST=[], len_in_bytes=16)

We should probably address this, though it's not clear what is the best way.

jedisct1 commented 4 years ago

H(Z_pad || l_i_b_str || I2OSP(0, 1) || DST_prime || msg)?

davidben commented 4 years ago

(Out of curiosity, any reason not to use HKDF? Different requirements?)

Does DST_prime need to be in a matching location between b_0 and b_i computations? Otherwise the I2OSP(i, 1) iteration number doesn't line up quite right. I assumed that was why the tail end of all the hash invocations matched.

To that end, if we're lining everything up at the tail end, I think it'd work to just redefine DST_prime to DST || I2OSP(len(DST), 1).

NB: I'm only guessing as to the requirements and am probably making wrong assumptions. I've only just started looking at this draft and have no context as to why any of it looks the way that it does.

davidben commented 4 years ago

expand_message_xof seems to have the same issue in the computation of msg_prime. I think the same fix to DST_prime addresses it.

It probably doesn't matter here, but this construction also has inconsistent block alignment of msg in expand_message_xmd. It sometimes follows a block-size prefix and sometimes a hash-sized prefix. (But it is injective with the DST_prime fix to align it all at the end rather than the start.) For comparison, HMAC (and thus HKDF-Extract) and the AD constructions in AES-GCM, AES-GCM-SIV, and ChaCha20-Poly1305 are all sensitive to the block size, with the AEADs even padding AD and ciphertext up so they both consistently start at block boundaries. (Then they add both lengths to be injective.) This is handy if, e.g., you've got iovecs floating around.

~~But this probably doesn't matter here. Fiddling with unaligned or partial blocks is negligible compared to the cost of field operations and presumably you aren't hashing bulk data.~~

(Ignore this; I missed that msg was not in the b_i calculations, only b_0.)

kwantam commented 4 years ago

(Out of curiosity, any reason not to use HKDF? Different requirements?)

This is a great question. We discussed at some length in #202.

Does DST_prime need to be in a matching location between b_0 and b_i computations?

No, the alignment doesn't really matter per the CDMP05 analysis.

To that end, if we're lining everything up at the tail end, I think it'd work to just redefine DST_prime to DST || I2OSP(len(DST), 1).

This is a great suggestion! Thank you.

kwantam commented 4 years ago

H(Z_pad || l_i_b_str || I2OSP(0, 1) || DST_prime || msg)?

Unfortunately, this doesn't work. We're enforcing domain separation with a trailing DST, and this would break that.

davidben commented 4 years ago

(I clearly didn't phrase "Does DST_prime need to be in a matching location between b_0 and b_i computations?" very well. That was confirming whether DST needed to be consistently trailing, because the https://github.com/cfrg/draft-irtf-cfrg-hash-to-curve/issues/236#issuecomment-610813374 suggestion broke this properly. Sounds like the answer is yes.)

kwantam commented 4 years ago

It probably doesn't matter here, but this construction also has inconsistent block alignment of msg in expand_message_xmd

Wait, I don't get it. msg is only an input for the b_0 computation, and it's always exactly one block offset from the start, no? Do you mean block offset from the end? Or are you talking about the offset of b_0 in the b_i's? Sorry to be slow...

It occurs to me: we would have avoided the injectivity issue entirely if we'd encoded the length of the message in the hash. I suppose we might still consider doing this if we change the input for b_0 anyway (though we'd then have to decide what the longest msg we can support is... I worry that the embedded folks might complain if we add another 8 bytes to the input...)

davidben commented 4 years ago

Wait, I don't get it. msg is only an input for the b_0 computation, and it's always exactly one block offset from the start, no? Do you mean block offset from the end? Or are you talking about the offset of b_0 in the b_i's? Sorry to be slow...

Er, yes. Ignore me, I can't read and missed that b_i's don't have msg in them.

cfrg / draft-irtf-cfrg-hash-to-curve

expand_message_xmd injectivity #236