LoupVaillant / Monocypher

An easy to use, easy to deploy crypto library
https://monocypher.org
Other
594 stars 79 forks source link

feature request: Implementations of some of the handshake patterns of the Noise protocol framework? #277

Open ethindp opened 1 week ago

ethindp commented 1 week ago

I'm not sure how complicated this would be, but if it's possible I think it would be great if this library implemented some of the handshake patterns in the Noise protocol framework (in particular, N, KK, and XX). I would do it myself but I'm uncertain about how to properly go about it (translating all the custom token strings that Noise uses has been difficult for me in the past, I'm no cryptographer :)). I know that some libraries like libhydrogen do it, but it would be nice if Monocypher provided at least some of the patterns. Would this be feasible or is it out of the scope of Monocypher?

samuel-lucas6 commented 6 days ago

I also dislike the pattern formatting. This blog post series is very good for the non-interactive patterns, although the website template has changed, which makes it less readable. Noise Explorer is also quite useful. For example, if you click on SHOW DETAILED ANALYSIS, it explains what everything means. You just have to filter out some of the noise if you're not using Noise as a protocol.

ethindp commented 6 days ago

@samuel-lucas6 Oh my God, how did I not hear of this before now (noise explorer, I mean)? This I think may (theoretically) definitely help in figuring out the framework and trying to implement any of the patterns! Though I still think that it would be nice if Monocypher implemented some of them, this will definitely be a useful resource to test any implementation I can come up with against some trusted baseline or something! (And I'll definitely look at that blog post too!) I'll leave this open in though, it would be nice to see something like this (some of the patterns) in monocypher!

LoupVaillant commented 6 days ago

Hi, I understand the desire for a handshake protocol enough that I studied it myself. Here are my conclusions for now:

ethindp commented 6 days ago

@LoupVaillant I tried monokex but I couldn't really figure it out, unfortunately. (I have very little experience with OCaml and it didn't work when I tried building it on Windows, I think.) I've looked at the Go implementation for the XX pattern from Noise explorer (I donn't know if that link will work -- it's a blob: link), and I could probably rewrite it in C/C++, but I get a bit wary around parts of code like:

func generateKeypair() keypair {
    var public_key [32]byte
    var private_key [32]byte
    _, _ = rand.Read(private_key[:])
    curve25519.ScalarBaseMult(&public_key, &private_key)
    if validatePublicKey(public_key[:]) {
        return keypair{public_key, private_key}
    }
    return generateKeypair()
}

The part that gets me wary is the curve25519.ScalarBaseMult line, because I don't know if I would replace methods like GenerateKeypair with monocyphers or what. So although I could rewrite it to use monocypher I'm hesitant just because the "don't role your own crypto" has been hammered into my mind and I just generally don't want to screw it up (lol). I can see how something like NOise would be out of the scope for monocypher, but as you said, they're insanely useful, and it would be great to have another library that I trust that implements them.

LoupVaillant commented 6 days ago

(Note: I chose OCaml because I felt it would make things easier for me, but now I’m not so sure. I may try to reimplement Monokex in C to reduce the number of dependencies.)

OK, what you really want are test vectors. Transcripts of an actual Noise handshake for the pattern you want to implement, ideally with intermediate results too, similar to what we can find for Argon2. It will at the very least validate the happy path. (Of course, you want to validate the unhappy paths too, but that’s easily done by flipping bits in a message, and verify that the rest of the handshake starts to fail as it should).

Now curve25519.ScalarBaseMult specifically, almost certainly means Monocypher’s crypto_x25519_public_key(). Because the public key is basically the scalar multiplication of the base point by the private key. (Okay, the private key is clamped first, but everyone does that by default.)

Now reading my manual, I realise my explanation of X25519 lacks detail. There’s a high-level description of what happens, but not a clear specification of the maths involved. I should probably add a section about that.

ethindp commented 6 days ago

@LoupVaillant Ah, thank you. I'm attempting (operative word...) to re-implement the go code in C++ (I'm using that because I feel a lot more comfortable in it, so I don't know if it would be acceptable in monocypher but it could perhaps provide a template that could be rewritten, assuming I do it right), I would just need help auditing it because I'm no master of crypto myself. It's really fun doing this though just because I get to really dig into monocypher and use it beyond what I've used it for (just encryption/hashing/password hashing) and I get to really understand how it works and figure out good usage patterns for it on the way.

LoupVaillant commented 6 days ago

I don't know if it would be acceptable in monocypher

Not monocypher itself, I consider higher-level construction belong to separate projects. However, I do intent to link to such other projects from Monocypher itself.

I would just need help auditing it because I'm no master of crypto myself.

I would gladly review your work.

ethindp commented 5 days ago

Okay, so I genuinely don't know if this is just noise explorer or something else but the code it gives me looks way, way too complicated, and though I've tried to rewrite it in C++ it just seems way, way over-complicated for the NN handshake pattern, which I decided to try first just because it dealt only with ephemeral keys. As an example, I've attached what I have so far, bugs/unused headers and all. I did use some headers from external libraries (in particular mimalloc for a secure memory allocator, rng_get_bytes from libtomcrypt, and a small library for decoding of hex into bytes (that's called fast-hex). Even so I'm already at line 490 and it just seems a bit much for what should, in theory, be simple. Is this just Cryptography being complicated or am I way overdoing things? (I haven't ran this through a compiler yet, but it's my attempt to translate the provided Rust implementation using monocypher; the Rust implementation is here, under "Generate Secure Protocol Implementation Code"). But I thought I would at least post my progress so far even if it isn't complete. It made sense (in my mind anyway) to derive my implementation from the Rust one so I would at least have something to follow so I don't screw something up horribly. But if this is dramatically over-complicated I would appreciate any tips or ideas for dramatically simplifying it because this seems a bit much , but this is also my first time doing this so maybe that's just me.

My implementation thus far: kx_nn.h.txt

I'll probably continue working on it either sometime tonight or tomorrow, but I hope it's at least decent thus far (though as I said I'd love to simplify it a lot).

LoupVaillant commented 5 days ago

in particular mimalloc for a secure memory allocator, rng_get_bytes from libtomcrypt, and a small library for decoding of hex into bytes (that's called fast-hex)

Though I haven’t implemented it myself I’m pretty familiar with the specs, and I’m positive you don’t need heap allocation. I understand the lure for "secure" allocation, but that’s just a side channel, and one that basically requires compromising the host computer. Just allocate everything on the stack, and when you leave a scope, wipe all buffers with crypto_wipe(). It’s not perfect, but it should mitigate most of what we care about.

As for decoding hex into bytes, my guess is that you only need that to translate test vectors. The actual protocol is supposed to talk binary directly.

Even so I'm already at line 490 and it just seems a bit much for what should, in theory, be simple.

Noise does have a couple avoidable complications — I believe to make it easier to analyse. For instance it uses HMAC even with BLAKE2B, which has a keyed mode that can do the same. It also separates the transcript hash from the key generation for better independence, but all it really does is making the inner state more complex.

In any case, rather than trying to port the code, may I suggest that you write from specs instead? Take the NN pattern, and just implement the first two messages. That should be 3 functions total:

  1. Initiator sends first message.
  2. Responder receives first message, computes the session key, and sends the second message.
  3. Initiator receives the second message and computes the second key.

What happens in those functions is not trivial, but I believe it’s simpler than Noise Explorer’s more generic approach with just one function to send and another to receive. Which I confess is also the API I wrote for Monokex, but that’s not an easy first draft.

Oh, and forget about the forbidden curve values, they do absolutely nothing for security: if a party wants to shoot themselves in the foot by using low-order point, that’s on them. And as a threat model it makes no sense: instead of using a low-order point to make the whole thing insecure, they can just leak their own keys.

fscoto commented 5 days ago

[I don’t want to derail this issue entirely, but Loup said something that I feel like needs and deserves a reply in a public forum.]

Now reading my manual, I realise my explanation of X25519 lacks detail. There’s a high-level description of what happens, but not a clear specification of the maths involved. I should probably add a section about that.

I can't properly articulate why down to a logical argument, but something about this bothers me. I feel like a man page is not the place to start detailing mathematical minutiae. A manual page is meant to be reference material providing the minimal set of information required to effectively use a function. I believe that goal is already accomplished with what we have in crypto_x25519(3monocypher).

What this does make me think is that there either does need to be an add-on library for common tasks or, perhaps, instead a long-form book like “Cryptography Cookbook with Monocypher (and libsodium)”, where you have enough room to outline the concepts and details, perhaps also including all the content on https://elligator.org/ that digests the otherwise impenetrable Elligator paper. The reason I’m discarding the notion of an add-on library is the spirit of Monocypher. Please correct me if I’m wrong, but looking at the mission statement on the front page of the website: Monocypher always felt more like something you’re allowed and meant to work with rather than just take it as an unmodifiable blob. If you need to reduce code size, just go in and delete things you have no use for. If you need to implement draft-irtf-cfrg-det-sigs-with-noise-03 directly, then just go in and do it. People are scared to do these things because of the “don’t roll your own cryptography” mantra. But to me, Monocypher is closer to a breadboard kit than a fully assembled computer.

Consequently, this means that an add-on library cannot carry on the same spirit. As you observed above, this implies that things like handshakes or alternative certificate formats both must be out of scope and have to be presented in a more à la carte fashion than Monocypher. It is this contradiction that leads me to think that perhaps actually a cookbook would be the correct choice, where you’d be shown the problem (such as a session-based interactive network key exchange and other “common” Noise patterns, signcrypting firmware, when and how to mitigate key-partitioning attacks/invisible salamanders, database security for relational and key-value databases especially in multi-tenant scenarios, padding [PADMÉ and PURBs come to mind]) and a proposed solution with a concrete implementation with the trade-offs made and relevant literature/standards cited.

I suspect that kind of educational approach to empower people to make their own trade-offs would be more in line with the Monocypher aesthetic.

LoupVaillant commented 5 days ago

I feel like a man page is not the place to start detailing mathematical minutiae.

You’re right. And by that account, the first sentence of the man page is specification enough: _"crypto_x25519() performs an X25519 key exchange between your_secret_key and their_publickey."

Thus, no need to update the manual here. Thanks for the feedback.

Please correct me if I’m wrong, but looking at the mission statement on the front page of the website: Monocypher always felt more like something you’re allowed and meant to work with rather than just take it as an unmodifiable blob. If you need to reduce code size, just go in and delete things you have no use for. If you need to implement draft-irtf-cfrg-det-sigs-with-noise-03 directly, then just go in and do it. People are scared to do these things because of the “don’t roll your own cryptography” mantra. But to me, Monocypher is closer to a breadboard kit than a fully assembled computer.

You’re correct. Sure I try to make it easier to stick to the public API, but one inevitable consequence of trying my best to keep Monocypher small and self contained is that it is unusually easy to edit — and thus, is a valid use case.

On the other hand, I would not dismiss the notion of higher-level libraries that depend on parts of Monocypher. Though ideally I’d make it so we can easily replace it by something else (typically speedsod—I mean, libsodium). One obvious contender would be Monokex, though I still haven’t got around to actually finish the damn thing (the code works and is almost certainly correct, but the protocol isn’t formally verified, and in 2024 that’s no longer good enough).


Now the cookbook is a bloody good idea. I have been musing for quite some time about writing (even filming) my own cryptography course. Focusing it around Monocypher would make it more hands-on, shorter, and it would be easier for people to pick and chose which chapter they’re specifically interested in. And most importantly, this independence between chapters means I can write them in any order.

I’ll start with one chapter. Who knows, if I write enough of them I might even have a book.

ethindp commented 4 days ago

Okay, coming back because I just completely rewrote my entire thing from scratch following the spec as precisely as I could. Yeah, I know, I could've used what I initially wrote, but hey, it seemed like a good idea to do it this way (especially when I sometimes feel like I'm just flailing around, lol).

I took your advice, @LoupVaillant, and didn't try to get too fancy. I don't even support the full spec or pre-shared keys or any of the more complex patterns, and I'm not sure if I'll add that or not (I don't know if something like a PSK would have much benefit given the use case I'm hoping to employ this in if it actually is compliant and secure). I also don't support multiple public keys in the handshake state object, since I had no idea how I would actually make that work when it came time to get down to business and do the encryption/decryption parts. Idk how well I did and I probably majorly overdid it but I also wanted to try to follow the spec as best I could, which is hard when I try to go my own way and "strip out" a bunch of things to create the most minimal impl possible.

Anyway, sorry for my rambling, here you go: noise.cpp.txt noise.h.txt

For what it's worth, I think the cookbook is a great idea. Monocypher is an incredible crypto library, especially for those who don't want to pull in huge libraries like openssl. But I do think something like a cookbook (I prefer text, not videos/images, personally), especially if it explained things like this and how one might go about implementing them, would be extremely valuable.

LoupVaillant commented 4 days ago

@ethindp, it looks like you’re in the right track. If this passes a rigorous set of test vectors this should be good enough for release.

Though I initially felt like your code should be even simpler, I can’t find any obvious simplification myself, so I’m guessing that’s Noise being more complex than it should. Setting aside my growing dislike for C++ and the lost formatting, I have but two small suggestions:

  1. Some of your if statements span almost the entire function. This suggest the possibility of an early return instead, so you can reduce the amount of indentation. Though I reckon this wasn’t a slam dunk, because…

  2. …The noise protocol specifies the entire session, handshake and regular message exchanges. Personally I don’t like this approach. I prefer to have a handshake API handle only the handshake, then hand you the session key(s) and transcript hash so you can then use AEAD to send messages. Now I know why they did it, so the user don’t botch the nonces with too much freedom, but you could imagine separating the handshake from the rest.

    That would for one thing allow you to return early (with an error) if the message_patterns is empty, but most importantly it would clearly separate the handshake part, where payloads don’t have full security, from the symmetric part, where payload do have full security. You can still support early payloads in the handshake itself, but it’s easier to document them as "not quite safe".

It’s more about personal preference than hard guidelines, do what you think is best. Formatting & tests aside, if this was a pull request for my day job I would accept it.


(I prefer text, not videos/images, personally)

I’ll go text first anyway. Videos, if any, will be an optional supplement.

ethindp commented 4 days ago

@LoupVaillant I would agree with your positive assessment if it weren't for the fact that from my tests of an actual implementation (I'm uncertain how to work with test vectors) against this one doesn't work on mine. I've done a lot of fiddling and rewriting (to the point that most of the functions that are called by the lower level objects are templatized with only a couple specializations needed, so I let the compiler do all the work for me as much as possible) and I've discovered some oddities:

  1. The cipher state I don't think is being initialized properly, even though it should be. I'm trying to replicate the example that Snow provides in it's docs, where the responder is a separate application So I could be doing something wrong there (I call read_message, then write_message - but maybe I need to do it the other way? It's a bit tricky for me to tell the order of operations even though my brain automatically goes "responder reads to get pks first".)
  2. I can't use empty() on std::arrays, I don't think. I think this always returns false for an array that has a non-zero size. For now I've removed those checks from read/write_message because I think it's reasonable to assume that they may never actually occur given the rigid way I require you to initialize the handshake state with a known, well-defined pattern.
  3. I discovered a bug in read_message which caused the message patterns to not get popped from the deque, which I thought was causing problem 1 above but... Apparently not.
  4. I rewrote the hmac_hash function to, you know, actually do an HMAC (since it doesn't look like crypto_blake2b_keyed implements it (which is of course perfectly fine), following the implementation of crypto_sha512_hmac but updated for BLAKE2B.
  5. I've switched to using the recommended configuration for ChaChaPoly instead of trying to use XChaChaPoly (as much as I like that version of the cipher), purely for interop reasons.

There are probably a few other changes I've done that I haven't listed but I'm not sure how "major" they are.

I'm a bit confused about the rule in the description of HandshakeState::Initialize in the spec itself, where it says that it:

Calls MixHash() once for each public key listed in the pre-messages from handshake_pattern, with the specified public key as input (see Section 7 for an explanation of pre-messages). If both initiator and responder have pre-messages, the initiator's public keys are hashed first. If multiple public keys are listed in either party's pre-message, the public keys are hashed in the order that they are listed.

At first I thought that I needed to MixHash() the keys in the order that each pattern defined, but that didn't feel right (but if it's right I can reintroduce it), so now I just check if each key is passed into Initialize and MixHash() the public part accordingly regardless of pattern.

I can understand your dislike of C++, but I do think it can be a beautiful language; I could simplify a lot of the code if I added a few using namespace directives and the std::ranges::/std:: prefixes would be gone. But I keep them for clarity, mostly.

The result of the templatization of a lot of the lower level routines has as a consequence made the entire project, I think, a lot easier to follow. It's quite nice when you just let the compiler figure out all the annoying bits for you. And of course I get the benefit of fewer vectors all over the place, so fewer heap allocations (I think...) and fewer intermediate/temporary buffers.

As for the formatting, yeah, I apologize for that because it completely slipped my mind and I only realized it like 4 hours later. I've definitely ensured that this latest update is formatted.

Finally, as for test vectors, I would love to actually do this (or, even better, use the many test vectors that already exist). But I looked at the test vector format and some existing ones and couldn't figure out how I would go about actually making them work without suspending requirements of the specification. Well, and the other issue that they only contained one ephemeral key, but I kind of need 2 to make a key pair...

So yeah, lots of little updates and things. I'm not really certain how to solve the bug I'm facing at this moment in time (but I feel like this will end up being one of those moments where I say that and then end up fixing it later, haha). But I'll try to dig into it and find out what's causing it.

I've put all the code on GitHub so I don't have to keep uploading files here. If you want to continue this discussion there that'd be alright.

ethindp commented 3 days ago

Okay, so I think I've solved many of the issues, but I'm running into a very weird problem.

When I try to test my code against the Rust and Python implementations that I've put in my repo, my implementation manages to complete the handshake (at least, from it's local POV) but neither of the others are able to decrypt my responses. At first I thought I was misusing std::span so I replaced it with pointer arithmetic and it still doesn't like it. I don't really have any visibility into what either of the two implementations actually expect, but I keep getting "invalid tag" errors (at least for the Python one) and just a "Decrypt" panic in the Rust one. I can't tell if I'm combining the ciphertext and MAC incorrectly, if my nonce mismatches the initiators, if the AD is wrong (which would mean the CipherState/SymmetricState objects are broken somewhere), or if this is a monocypher bug. (I highly doubt the last possibility, but anything is possible so...)

ethindp commented 3 days ago

Well, I have great news. 3/4 of it, anyway.

I did a ton of tinkering and debugging and now my Python server (which uses dissononce) correctly is able to complete a handshake. The rust one still fails the decrypt step somewhere, but given that the error is so unhelpful as to be useless, I'm not (overly) bothered and it very well could be that Snow is doing something wrong. I may do one more "test" server implementation in another language to get at least 2 implementations saying I'm correct before I consider my work to be correct (at least for this pattern, but in theory it should apply to all others, hopefully?), but I'd like to think I've made some good progress so far.

ethindp commented 2 days ago

cc @LoupVaillant

LoupVaillant commented 1 day ago

@ethindp You’re fast.

Right now there is one simplification I can think of, which is made more obvious by the fact you’re now supporting the deferred patterns as well as the fundamental ones (note: in practice I’m not sure the deferred patterns have any advantages over their fundamental counterparts. They’re cute, but they may be virtually useless).

Right now you switch over patterns twice: once to determine the actual contents of the pattern, and once more to initialise the handshake state. What I suggest you do instead, is to record which key is shared before the handshake begins instead. That way you get only one switch, and can simplify the init function a bunch.

Personally for Monokex I did away with the switch altogether. Instead I load my state machine with the contents of the pattern, and drive everything else with that… which means one small init function per pattern, which is it’s own kind of verbose I’ll admit, but it also makes it easier to delete the code you don’t use.

ethindp commented 1 day ago

@LoupVaillant That definitely is an optimization I may consider. I don't know if I'll keep the deferred patterns or not just because I'm not positive how to use them. But right now I'm trying to get any pattern other than NN to work. When I try to implement the XX pattern, or even KK, one of the two handshake states in the simulation has a nonce that's one less than that of the other, and I think that is causing my particular problem wherein Monocypher gives me a completely different MAC to that of what the other party says it should be. So as an example,, on one run of my implementation of the XX pattern, when Bob wrote his message into second_msg (after reading Alices out of first_msg), the MAC (according to Bob) was {0x79, 0x92, 0xc1, 0x74, 0xaf, 0xed, 0x3, 0xf1, 0x18, 0x6d, 0x32, 0x50, 0xc9, 0xa5, 0xcc, 0x61}, but Alice computed the MAC as {0xf6, 0x6a, 0x15, 0x18, 0x1f, 0xee, 0x3e, 0xb2, 0x1c, 0x8d, 0x99, 0x6, 0x7a, 0x40, 0x8c, 0xfb}.

I've tried both using n++ and n += 1 (with n += 1 as a separate statement) but it still breaks and I have no idea why. So any advice would be appreciated because I'm extremely confused.

ethindp commented 23 hours ago

@LoupVaillant So I've finally gotten a test runner working and written, but I'm not sure which test vectors to use. My implementation is strictly spec-compliant, so it obeys the requirement that, i.e., "[I]f k is non-empty returns ENCRYPT(k, n++, ad, plaintext). Otherwise returns plaintext." Looking at the test vectors provided by both cacophony and snow, unless I'm misinterpreting how the "messages" element is to be used (I'm doing the testing for both the handshake and the actual encryption/decryption process), the first message should have the same ciphertext as the payload, I think, since the cipherstate object hasn't yet been initialized with a non-empty key. So my encryption routine isn't encrypting anything since there's nothing to encrypt with (and I'm not going to go "Oh I'll just use a key of all zeroes" either because I feel like that'd be a very bad idea). Is there something I'm missing? It doesn't help that the test vectors page doesn't really describe the format all that well from what I can see.

LoupVaillant commented 18 hours ago

@ethindp There are only 3 ways a MAC can be different: the plaintext is different, the additional data is different, or the key is different. In your case, assuming you’re correctly parsing the message and getting the right ciphertext (and I don’t recall Noise encryption having any additional data), then I would bet the keys are out of sync.

Have you tried to print the successive keys you’re using? It may not tell you why stuff breaks, but it may tell you when.

So I've finally gotten a test runner working and written, but I'm not sure which test vectors to use.

I generally generate my test vectors from a third party implementation I trust. Official test vectors from RFCs are nice, but they’re rarely enough. For Noise, a single set of test vectors with random keys would at least confirm your happy path is working as it should. You may also want to test various possible lengths for payloads, but that kind of test has already been done in the underlying libraries so that’s less important here.

Then for the unhappy cases, I take a happy case and flip a single bit. Which actor should report an error and when depends on which message you corrupted. But the most important is that you get a clean failure before the handshake ends.

Official test vectors are always nice to test against, but I would likely put them off until I do the above. Extracting vectors from an actual implementation could also help you understand their (presumably) ill-described format.

ethindp commented 13 hours ago

There are only 3 ways a MAC can be different: the plaintext is different, the additional data is different, or the key is different. In your case, assuming you’re correctly parsing the message and getting the right ciphertext (and I don’t recall Noise encryption having any additional data), then I would bet the keys are out of sync. Have you tried to print the successive keys you’re using? It may not tell you why stuff breaks, but it may tell you when.

@LoupVaillant So I did try this just now, and yeah, the keys were out of sync (I wasn't implementing S properly). So I fixed that, and now I know the exact problem< the additional data is mismatching, but I don't know the "why".

I've added some debugging statements (present in commit eabbe93 of my repository if you would like to have a look) (I would appreciate it, I think I need a second pair of eyes). The debugging statements were just something I very quickly threw together so they aren't fast or pretty or anything. But I've tried comparing my implementation to a Python one (Dissononce), and I'm pretty positive I'm doing the same thing they are, but their example for it is working and mine is not, so color me confused. The additional data is used but only in the handshake: it's the current hash of the symmetric state at the point it's encrypted/decrypted. I can't for the life of me identify why the AD is changing when it shouldn't be.