Add a preprocessor macro for removing blake2

ghost commented 4 years ago

HIi,

I wanted to use monocypher as part of a signature check reference implementation, using ed25519 / sha512.

But with the latest monocypher master, I struggle in optimizing code size. In particular since the ctx/hash api works through function pointers, the linker is never unable to remove the large blake2 code (not even when including all the code into the same compilation unit)

This PR presents one possible way to remove this bloat, do you think that this is the right way to accomplish size optimization, and if something like this could be done also in master?

Thanks, Jukka

Add a pre-processor macro to remove BLAKE2 vtable. This allows linker to remove all blake2 functionality when using sha512 only, saving flash space

Signed-off-by: Jukka Laitinen jukkax@ssrc.tii.ae

LoupVaillant commented 4 years ago

That's actually a good idea, thanks. I didn't realise that exposing the vtable meant the linker couldn't get rid of Blake2b.

That said, I now wonder whether we shouldn't do this to other primitives as well. Perhaps not, if the linker can reliably remove unused functions, but since it couldn't remove the vtable…

@fscoto, thoughts?

fscoto commented 4 years ago

if the linker can reliably remove unused functions

Even if built as a separate object, linkers may remove unused functions in a file if link-time optimization (commonly -flto) is enabled. Unused vtables is a bit harder since the compiler has to be able to reason that no code path ever touches them and the functions themselves are no longer unused (because they're in the vtable). I wouldn't be surprised if compilers struggle removing the vtable.

That said, these code size issues seem to spring up on occasion. I'm not sure if I think more compile-time directives are the correct solution here. I'd like to point out that the idealized deployment model for Monocypher is still to download, drop-in and forget. It therefore seems legitimate to me that downstreams would outright rip out code they have no use for, rather than adding compile-time defines.

In particular, such compile-time defines are a slippery slope. I am not saying they're necessarily a bad thing, but I am saying that it needs to be carefully considered. At the end of this, there would be defines for ChaCha20, Argon2i, X25519, Elligator, signing as a whole… because some compiler fails to remove the artifacts due to something or other. At which point the manual becomes more complicated than it already is, too, because it needs to advertise that functions may be missing/specifically enabled.

Radical idea: I feel compelled to note that the root of the issue stems from Argon2i, really. Monocypher had to pick whether to ship multiple hash functions bloating code size in the default setting, whether to re-use BLAKE2b from Argon2i (breaking Ed25519 compatibility) or whether to re-use SHA-2 from Ed25519 (breaking Argon2i compatibility). As far as I understand the Argon2i draft, H could have been be any hash function and they just happen to specify BLAKE2b.

Perhaps this trade-off was made poorly considering how this is the single thing that keeps coming up over and over again*. I'm not going to go into details, but a cursory look suggests that like interoperability between signature systems (libsodium on server, Monocypher on embedded client) is more important than interoperability between password-based key derivation functions (I struggle to come up with an example where Monocypher would be an appropriate choice). Monocypher 3.0 seems to me, however, to be far too late into the game to swap this hash function trade-off unless there is overwhelming demand for it to be swapped.

The upside of BLAKE2b being the default function is that it works correctly when you do H(key || data) instead of passing the key parameter; SHA-2 breaks to length-extension attacks if you don't realize you need to HMAC. Of course, when Ed25519 was designed, there was no BLAKE2 and no SHA-3 yet, complicating this further with historical reasons.

In light of this, I'm somewhat critical of this notion, especially as it appears to me that it addresses the issue on the wrong layer: Shouldn't this be done further upstream as a compiler optimization shortcoming or shouldn't we swap the default hash function? I'm not trying to disclaim Monocypher's responsibility as a library to work with the things that currently exist (rather than those that could exist, like a compiler that recognizes this pattern and removes the vtable along with the functions), but I do believe it's not the optimal place to solve the problem.

LoupVaillant commented 4 years ago

One thing I've just realised, is that preprocessing flags should in principle all be tested. Right now they are. But merely disabling a section of code? There is the temptation not to automate their testing, which would set a bad precedent. I'd have to update my test suite, make it a little more complex than it already is.

Moreover, removing a primitive means breaking anything that depends on it. For instance, disabling Blake2b will break Monocypher's default EdDSA. It may work here because you don't actually use EdDSA, making the linker happy. But ideally, we would want to either disable EdDSA-Blake2b whenever Blake2b is disabled, or trigger a preprocessor error whenever Blake2b is disabled, and EdDSA-Blake2b is not. Repeat that for Chacha20/AEAD, Poly1305/AEAD, X25519/Elligator (the inverse map anyway), EdDSA/EdDSA-Blake2b… and you'll get an idea how far this rabbit hole goes.

Besides, we're looking at at least 50 lines of pre-processor code, throwing us well past the (self imposed) 2K lines of code limit.

This gets worse: disabling something on monocypher.c only feels unclean, because monocypher.h still has declarations. This prevent the compiler from catching mistakes. Instead the linker will throw an error, and those are never nice to debug. The solution would be to put #ifdef in the header as well, but then your build system must define them not just for monocypher.c, but for anything that includes monocypher.h. At which point those preprocessor definitions must be careful about not cluttering the global namespace. This means having a common prefix (likely CRYPTO_), which makes things yet a bit more unwieldy.

For all these reasons, and others outlined by @fscoto, I'd rather not go there. This pull request is reasonable, and the idea behind it has definite merits… ultimately though, this goes in a direction I'd rather not take. It's with a little pang in my heart that I close this pull request, at least for now. Sorry, and thank you.

(Note: Monocypher is basically finished. I'm afraid that as more people take interest and try to contribute, I'll end up saying "no" to a lot of reasonable requests and good ideas.)

Back to the "right" way to optimise code size: Simply delete the code.

I understand why one might be reluctant to modify the source code locally in any way. Maintaining a local branch is never fun. In the case of Monocypher however, I believe this is not such a big problem:

Monocypher requires few updates. Since version 1.0.0, there has been a grand total of 7 bugs, only one of which required an update in practice.
Monocypher is stable. Though in version 3.1.1 already, the major version bumps were triggered by very minor changes in the API, all of which easy to compensate for. The 1.x and 2.x branches are still maintained, and will be for the foreseeable future.
The internal structure of Monocypher is simple. While the algorithms themselves can be very tricky, architecturally Monocypher is almost trivial. Functions have no effect beyond writing to their output buffers. There is no initialization mechanism. The dependency structure is explicit. So if you remove a function, the very worst that can happen is a linker error.
Monocypher is versioned. It's a little thing, but one reason not to maintain a modified version is because it can make it harder to track which upstream version it comes from. Since version 2.0.6, there's a version number on top of all shipped files, so you can know at a glance which version you started from, and perform a diff if you ever need to.

@scoto's radical idea: I would have 2 reasons, even now, to chose Blake2b over SHA-512 despite the resulting incompatibility:

The length extension attacks you mentioned. Incidentally, I added HMAC-SHA-512 in the optional file because of it.
Blake2b runs over twice as fast as SHA-512.

There's no way SHA-512 can be the default. But at least we can avoid Blake2b entirely, if we're not using Argon2i. (If we are using Argon2i, its humongous memory requirements automatically means we don't care about binary code size).

If I were to do it all over again, I would be tempted to consider Blake3 instead of Blake2b, and replace Argon2i by a "cache hard" password hash. Maybe. And now that's it's done, I think I'd better wait at least 5 years before I seriously reconsider those choices. When the dust settles around the security margin of Chacha and Blake, and we have a reputable cache hard password hash, maybe I'll write the next version of Monocypher.

Maybe. It will probably take some major event, such as "OMG ECC crypto is broken", or a massive performance breakthrough.

ghost commented 4 years ago

Ok, thanks for your time and effort to explain your thoughts in detail!

LoupVaillant / Monocypher

Add a preprocessor macro for removing blake2 #198