LoupVaillant / Monocypher

An easy to use, easy to deploy crypto library
https://monocypher.org
Other
614 stars 81 forks source link

Question about 64-bit modulo #156

Closed skoe closed 4 years ago

skoe commented 4 years ago

Hi Loup,

thanks for the great library. I'm evaluating it for a small bare metal embedded device (Cortex-M3 with about 64 KiByte for my application) w/o standard library. The first and only place in my projects which needs a 64-bit division is that one (monocypher-3.0.0):

monocypher.c:948: undefined reference to `__aeabi_uldivmod'

which is:

return (start_pos + z) % ctx->nb_blocks;

Just as a quick hack to get it running, without any checking what things it breaks I changed it to that:

return (u32)(start_pos + z) % ctx->nb_blocks;

Which makes it compile without __aeabi_uldivmod (size_t would do the same, it's also 32 bits there). The question is: Is there a way to get rid of this 64-bit division without breaking things and security and stuff? This would make the code a bit smaller and probably even a bit faster on 32 bit machines. Sorry that I didn't try to understand the code fully, I think your answer will be more reliable anyway :)

LoupVaillant commented 4 years ago

Hi, thanks for the report. 64-bit division is indeed a bit of a bummer. Avoiding it would be nice. Your change however does not work in all cases, though. We need to be a bit more clever.

See, start_pos + z can easily exceed nb_blocks, which itself can be as high as 2^32-4. We'll definitely overflow on cases where memory use is set close the maximum possible (~4 terabytes). Won't happen in real life, but I'd rather preserve correctness.

Turns out though that the modulo itself was overkill, thanks to the following invariants:

start_pos     < nb_blocks
area_size     < nb_blocks
z             < nb_blocks
start_pos + z < 2 × nb_blocks

_(start_pos represent the start of the start of the reference area. The reference area is at most as big as the whole segment, minus the current block.)_

We can replace the modulo operation by a simple test and a subtraction:

u64 z   = (area_size - 1) - y;
u64 ref = start_pos + z;
return ref < ctx->nb_blocks ? ref : ref - ctx->nb_blocks;

That's definitely not constant time, but we don't care: this is Argon2i, the reference index does not depend on any secret (which is why we could use it as index in the first place). We can therefore use conditionals to compute it.

There, done. This modulo operation bothered me, because it's basically the only one in all Monocypher. Thank you again for telling me, I would never got around to fixing it otherwise. I'll close this issue with the fix, which will be integrated in the next version of Monocypher (planned for the first half of March).

In the mean time, I'd advise you to patch version 3.0.0 manually, then run the test suite (make test) to make sure you didn't break anything. Alternatively, how about deleting Argon2i altogether? I'm all for not touching the code at all (easier that way), but if the linker doesn't eliminate dead symbols, deleting code you don't need will save you precious kilobytes.

skoe commented 4 years ago

Thanks for the quick response and the explanation.

You are right, I don't need Argon2i. I disabled LTO and function sections, to see the worst case size (who knows what may come in future :laughing: ). That's why the code was not eliminated.