Closed fscoto closed 4 years ago
The reason for going down to u32 was not any machine specific consideration, but C itself. Note that as far as I know, there is no implementation defined behaviour here.
The overarching goal is to provide constant time comparison of buffers of size 16, 32, and 64. It is done in 2 steps:
x16()
, x32()
, and x64()
.neq0()
.The reason I had to divide the number in half was to take advantage of an edge case. Recall the final expression:
return (1 & ((half - 1) >> 32)) - 1;
By construction, half
is below 2^32. half - 1
on the other hand, might overflow. It will be below 2^32 or it will be equal to the maximum value.
If we were working with 32-bit integers, the maximum value would still be below 2^32, and shifting that 32-bit to the right would still yield zero. With 64-bit integers however, (0 - 1)
means 2^64-1 that when shifted yield 2^32-1. To recap:
(half - 1) >> 32 == 0
if 0 < half < 2^32
(half - 1) >> 32 == 2^32-1
if half == 0
The 1 & …
part just clamp the result to 1 bit:
1 & ((half - 1) >> 32) == 0
if 0 < half < 2^32
1 & ((half - 1) >> 32) == 1
if half == 0
Subtract 1 and voilà, I get my 0 or -1 result I wanted.
This would work the same with smaller integer sizes, but there's no need, because the C standard guarantees that I'll be working with 64-bit unsigned integer the whole time. I don't need explicit conversions here because half
is in the innermost expression, so the promotion to u64 will happen as it should.
Long story short, the short-cut evaluation is not avoided by by halving the 64-bit integer, it is avoided by the subsequent arithmetic expression. Had I dealt with with a 32-bit integer, I would have just converted it to u64 and used the arithmetic magic on it. Same with smaller integers.
If this strategy is not enough, we could find other ways. For instance, we could half our way down to a single bit, and output its negation. As far as I know however, what we have now is enough. Besides, if the compiler does use short cut evaluation here, that shouldn't be such a big deal: users will usually put the result of those comparison in a genuine if
statement so they can fail if they have to. See crypto_lock_aead()
for instance.
Ah, I see. So that's why there's a split. My assumption was wrong; case closed. Thank you for the explanation.
In 9f125cb7, fixed-size buffer comparisons were added, and with it the function
neq0
. In the context of snej/monocypher-cpp#3, I noticed that it usesu32
for comparison. I assume you split it into twou32
comparisons to avoid a compiler doing short-circuit, word-wise comparison on 32-bit processors representing au64
as twou32
s.Maybe this should be split into four
u16
s or eightu8
s instead sinceint
8-bit microcontrollers exist and may exhibit the same issue. Considering just recently, an issue on 16-bit platforms was fixed, this may be worth thinking about.It's also possible that my assumption is false and the split into two
u32
s has another background, in which case I'd appreciate if I could be enlightened about the purpose ofneq0
's split of the input.