neq0: use 16-bit or 8-bit int comparison instead?

fscoto commented 4 years ago

In 9f125cb7, fixed-size buffer comparisons were added, and with it the function neq0. In the context of snej/monocypher-cpp#3, I noticed that it uses u32 for comparison. I assume you split it into two u32 comparisons to avoid a compiler doing short-circuit, word-wise comparison on 32-bit processors representing a u64 as two u32s.

Maybe this should be split into four u16s or eight u8s instead since int 8-bit microcontrollers exist and may exhibit the same issue. Considering just recently, an issue on 16-bit platforms was fixed, this may be worth thinking about.

It's also possible that my assumption is false and the split into two u32s has another background, in which case I'd appreciate if I could be enlightened about the purpose of neq0's split of the input.

LoupVaillant commented 4 years ago

The reason for going down to u32 was not any machine specific consideration, but C itself. Note that as far as I know, there is no implementation defined behaviour here.

The overarching goal is to provide constant time comparison of buffers of size 16, 32, and 64. It is done in 2 steps:

Produce a 64-bit number that will be zero if the buffers are equal, non-zero if they aren't. This is done in x16(), x32(), and x64().
Turn that 64-bit number into a boolean: if it is zero, output zero. If it is not zero, output -1. This is done in neq0().

The reason I had to divide the number in half was to take advantage of an edge case. Recall the final expression:

    return (1 & ((half - 1) >> 32)) - 1;

By construction, half is below 2^32. half - 1 on the other hand, might overflow. It will be below 2^32 or it will be equal to the maximum value.

If we were working with 32-bit integers, the maximum value would still be below 2^32, and shifting that 32-bit to the right would still yield zero. With 64-bit integers however, (0 - 1) means 2^64-1 that when shifted yield 2^32-1. To recap:

(half - 1) >> 32 == 0 if 0 < half < 2^32
(half - 1) >> 32 == 2^32-1 if half == 0

The 1 & … part just clamp the result to 1 bit:

1 & ((half - 1) >> 32) == 0 if 0 < half < 2^32
1 & ((half - 1) >> 32) == 1 if half == 0

Subtract 1 and voilà, I get my 0 or -1 result I wanted.

This would work the same with smaller integer sizes, but there's no need, because the C standard guarantees that I'll be working with 64-bit unsigned integer the whole time. I don't need explicit conversions here because half is in the innermost expression, so the promotion to u64 will happen as it should.

Long story short, the short-cut evaluation is not avoided by by halving the 64-bit integer, it is avoided by the subsequent arithmetic expression. Had I dealt with with a 32-bit integer, I would have just converted it to u64 and used the arithmetic magic on it. Same with smaller integers.

If this strategy is not enough, we could find other ways. For instance, we could half our way down to a single bit, and output its negation. As far as I know however, what we have now is enough. Besides, if the compiler does use short cut evaluation here, that shouldn't be such a big deal: users will usually put the result of those comparison in a genuine if statement so they can fail if they have to. See crypto_lock_aead() for instance.

fscoto commented 4 years ago

Ah, I see. So that's why there's a split. My assumption was wrong; case closed. Thank you for the explanation.

LoupVaillant / Monocypher

neq0: use 16-bit or 8-bit int comparison instead? #194