Closed chris-wood closed 4 years ago
H(Z_pad || l_i_b_str || I2OSP(0, 1) || DST_prime || msg)
?
(Out of curiosity, any reason not to use HKDF? Different requirements?)
Does DST_prime
need to be in a matching location between b_0
and b_i
computations? Otherwise the I2OSP(i, 1)
iteration number doesn't line up quite right. I assumed that was why the tail end of all the hash invocations matched.
To that end, if we're lining everything up at the tail end, I think it'd work to just redefine DST_prime
to DST || I2OSP(len(DST), 1)
.
NB: I'm only guessing as to the requirements and am probably making wrong assumptions. I've only just started looking at this draft and have no context as to why any of it looks the way that it does.
expand_message_xof
seems to have the same issue in the computation of msg_prime
. I think the same fix to DST_prime
addresses it.
It probably doesn't matter here, but this construction also has inconsistent block alignment of msg
in expand_message_xmd
. It sometimes follows a block-size prefix and sometimes a hash-sized prefix. (But it is injective with the DST_prime
fix to align it all at the end rather than the start.) For comparison, HMAC (and thus HKDF-Extract) and the AD constructions in AES-GCM, AES-GCM-SIV, and ChaCha20-Poly1305 are all sensitive to the block size, with the AEADs even padding AD and ciphertext up so they both consistently start at block boundaries. (Then they add both lengths to be injective.) This is handy if, e.g., you've got iovec
s floating around.
But this probably doesn't matter here. Fiddling with unaligned or partial blocks is negligible compared to the cost of field operations and presumably you aren't hashing bulk data.
(Ignore this; I missed that msg
was not in the b_i
calculations, only b_0
.)
(Out of curiosity, any reason not to use HKDF? Different requirements?)
This is a great question. We discussed at some length in #202.
Does
DST_prime
need to be in a matching location betweenb_0
andb_i
computations?
No, the alignment doesn't really matter per the CDMP05 analysis.
To that end, if we're lining everything up at the tail end, I think it'd work to just redefine
DST_prime
toDST || I2OSP(len(DST), 1)
.
This is a great suggestion! Thank you.
H(Z_pad || l_i_b_str || I2OSP(0, 1) || DST_prime || msg)
?
Unfortunately, this doesn't work. We're enforcing domain separation with a trailing DST, and this would break that.
(I clearly didn't phrase "Does DST_prime
need to be in a matching location between b_0
and b_i
computations?" very well. That was confirming whether DST needed to be consistently trailing, because the https://github.com/cfrg/draft-irtf-cfrg-hash-to-curve/issues/236#issuecomment-610813374 suggestion broke this properly. Sounds like the answer is yes.)
It probably doesn't matter here, but this construction also has inconsistent block alignment of
msg
inexpand_message_xmd
Wait, I don't get it. msg
is only an input for the b_0 computation, and it's always exactly one block offset from the start, no? Do you mean block offset from the end? Or are you talking about the offset of b_0 in the b_i's? Sorry to be slow...
It occurs to me: we would have avoided the injectivity issue entirely if we'd encoded the length of the message in the hash. I suppose we might still consider doing this if we change the input for b_0 anyway (though we'd then have to decide what the longest msg
we can support is... I worry that the embedded folks might complain if we add another 8 bytes to the input...)
Wait, I don't get it. msg is only an input for the b_0 computation, and it's always exactly one block offset from the start, no? Do you mean block offset from the end? Or are you talking about the offset of b_0 in the b_i's? Sorry to be slow...
Er, yes. Ignore me, I can't read and missed that b_i's don't have msg in them.
As pointed out by David Benjamin, Step 6 of expand_message_xmd is not injective! Here are two inputs that will produce the same output (b_0 = H(Z_pad + [0, 16, 0, 4, 0, 16, 0, 0])):
expand_message_xmd(msg=[], DST=[0, 16, 0, 0], len_in_bytes=16) expand_message_xmd(msg=[0, 16, 0, 4], DST=[], len_in_bytes=16)
We should probably address this, though it's not clear what is the best way.