I just saw this on /r/crypto.

Monocypher's current signature API provides a clean separation between the private and public keys:

void crypto_sign_public_key(uint8_t       public_key[32],
                            const uint8_t secret_key[32]);

void crypto_sign(uint8_t       signature[64],
                 const uint8_t secret_key[32],
                 const uint8_t public_key[32],
                 const uint8_t *message, size_t message_size);

It's simple, consistent with X25519… and prone to a subtle yet catastrophic misuse. Just as catastrophic as the ECDSA nonce reuse EdDSA was design to prevent, actually: signing the same message with the same private key and two different public keys instantly leaks the private key.

In principle users ought not do do that. In practice however some handle the storage of the private & public halves differently, such that they do not have the same security guarantees, or may simply get out of sync. And so the following may happen:

uint8_t sk[32];
uint8_t pk1[32];
uint8_t pk2[32];
uint8_t sig1[64];
uint8_t sig2[64];
size_t   message_size = get_message_size();
uint8_t *message      = get_message();

// Setting ourselves up for failure
get_secret_key(sk);   // secret keys are stored securely
get_public_key(pk1);  // public keys may get out of sync
get_public_key(pk2);  // such that pk1 != pk2

// Catastrophic API misuse
crypto_sign(sig1, sk, pk1, message, message_size);
crypto_sign(sig2, sk, pk2, message, message_size);

There are three ways to correct this.

We can simply not provide the public key at all, Monocypher will regenerate it on the fly, at the cost of making the signature twice as slow. In many cases that's not even a problem, but in platforms where signatures take forever that's a deal breaker. On the other hand, this can be implemented at the library level, where Monocypher would simply ignore the public key entirely, and always generate one.
We can provide an API similar to NaCl's, where the private key contains its own public half. That way users would naturally store both in the same place, with the same security level. On the other hand, doubling the size of private keys may in some specialised cases cause bandwidth or storage problems. Though realistically, it's not nearly as problematic as halving the speed of signatures.
We can hash the public key along with the private key and the message when we generate the nonce. That way changing the public key changes the nonce, and the catastrophic nonce-reuse vulnerability disappears. It's 100% compatible with regular signature verification, doesn't require any change in the API… and it breaks all test vectors, which rely on deterministic nonce generation (also, some systems that explicitly assume deterministic nonce generation may be confused).

I don't see any cost-free, elegant solution here. One obvious mitigation is to ignore the public key entirely, and regenerate it on the fly:

diff --git a/src/monocypher.c b/src/monocypher.c
index 7462903..fa01cdb 100644
--- a/src/monocypher.c
+++ b/src/monocypher.c
@@ -2237,5 +2237 @@ void crypto_sign_init_first_pass_custom_hash(crypto_sign_ctx_abstract *ctx,
-    if (public_key == 0) {
-        crypto_sign_public_key_custom_hash(ctx->pk, secret_key, ctx->hash);
-    } else {
-        COPY(ctx->pk, public_key, 32);
-    }
+    crypto_sign_public_key_custom_hash(ctx->pk, secret_key, ctx->hash);

Or more reasonably, provide a compilation flag so users who want it may easily enact this mitigation. But that implies updating Monocypher already, which should not be any easier than just sweeping the code base for crypto_sign() calls and replace the public key by NULL. In addition, we must decide whether we want this mitigation to be opt-in or opt-out, and deal with the non-trivial complications in the test suite. Overall not a good idea in my opinion.

The other route is to simply deprecate the current API, and switch to one where the secret and public keys are bundled into a single 64 byte blob. It could look like this:

void crypto_sign_key_pair(uint8_t public_key[32],
                          uint8_t secret_key[64]
                          uint8_t seed[64]);

void crypto_sign(uint8_t       signature[64],
                 const uint8_t secret_key[64],
                 const uint8_t *message, size_t message_size);

We could add such an API today, but removing the old one would require bumping the major version number, and without the old crypto_sign_public_key() we can't trade space for time by re-generating the public key on the fly.

In the mean time, we should update the manual: stern warnings, and most likely a dedicated paragraph in the security considerations section. I also think we need an example on how to handle a key pair safely, by storing both halves in the same 64 bytes buffer:

/* generate a key pair */
uint8_t sk[64]; /* Secret key          */
uint8_t pk[32]; /* Matching public key */
arc4random_buf(sk, 32);
crypto_sign_public_key(pk, sk);
memcmp(sk+32, pk, 32);
/* Wipe the secret key if it is no longer needed */
crypto_wipe(sk, 64);

/* sign a message */
uint8_t       sk       [64]; /* Secret key from above          */
const uint8_t message  [11] = "Lorem ipsu"; /* Message to sign */
uint8_t       signature[64];
crypto_sign(signature, sk, sk+32, message, 10);
/* Wipe the secret key if it is no longer needed */
crypto_wipe(sk, 64);

This may even replace the current example, so users would copy & paste safe code by default.

@fscoto I think I need your opinion on this.

I wish I had sensible guidance to offer. All I have is thoughts.

Impact

Apparently, this actually happened in the wild. Before seeing this, personally, I thought this would at best make for a cute CTF challenge. Infosec Twitter seems to mostly agree that this is bad. Purely from a PR standpoint alone, something ought to be done.

Fixes

You don't see any cost-free, elegant solution here: That's because there isn't. I don't see any alternatives, either.

Fix 1: sign(sig, sk[32])

The first way is the derivation of pk on the fly. You correctly note that it slows down signature generation by a factor of two.

However, putting on my embedded device glasses for a second, this compounds. Even if we ignore power analysis, avoiding glitching involves re-verifying the signature after signing. This would now compound to a slowdown times three. I'd really like to not go this way. But it is hands-down the safest and most straightforward option.

If we follow the philosophy that a pathologically slow program is broken (and I suspect you subscribe to the philosophy even more than I do), then this is a de facto hard API break for embnedded targets.

I'm aginst a compilation flag because it's already a fairly complex situation and won't work in any precompiled packages shipping libraries.

Fix 2: sign(sig, skpk[64], [pk])

The second way is storing the public key along with the secret key. This returns to the NaCl API.

There's still a subtle trap here: If you think you know what you're doing, you might be tempted to chop the key in half because you're aware of this common implementation detail. You then go ahead, with full confidence in your own skills, to fall right into this same API trap. We probably shouldn't have to guard against people who mess with internals of private keys, but we're having a discussion about how to prevent people who can't store a public key with a private key from doing the same thing.

The Zig standard library follows an approach of taking pk as additional argument, then checking memcmp(sk[32..64], pk) or erroring, though this approach fundamentally breaks Monocypher's principles of not having errors in non-verifying functions (though a precendent is in crypto_curve_to_hidden()); plus at that point, we might as well store seed||pk||H(seed||pk) as sk for a total of 96 bytes and offer a crypto_sign_public_key_regen_from_degenerate_sk() for scenarios where only 32 bytes of storage are available (e.g. expensive fuses).

Fix 3: Modify the nonce generation step

The third proposition is to change the nonce generation from H(R||A||M) to H(R||A||expandedseed||M) or similar. I'll be honest: I can't tell if this actually checks out. This feels dangerous.

Another thing to consider is that this breaks Ed25519-compat with test vectors. If FIPS 186-5 does standardize Ed25519, we might be faced with unforeseen demand there and someone might want to FIPS-certify a thing containing only the Monocypher signing code. However, part of FIPS compliance is that cryptographic modules perform a self-test, which would mean passing standard test vectors. That option would then be cut off. Recall that standards are the reason IETF ChaCha20 hadto be added by overwhelming demand.

Monocypher's API problem

Monocypher does not prevent API misuse as a general rule. As far as I recall, the main point was to avoid having interfaces that fail if invariants of the algorithm are broken. You want 128 bytes of BLAKE2b output? Here's some stack memory. Key lengths of static size are not checked at all. You need randomness for key generation? Pray your operating system vendor didn't abandon you (and that you know all OSes at compile time).

Cryptographic libraries seem to be headed in a general direction of strong misuse resistance. The audit results MON-01-002 and MON-01-005 point in a similar direction. AES-GCM-SIV got an RFC (I do wish to point out that it is exceptionally interesting that neither XChaCha nor any SIV-equivalent construction got traction in standardization effort). In that context, Monocypher is in part a victim of being part of the NaCl-ish libraries, whose "high-level" approach was later found not to be high level enough, and not being adventurous enough at the same time.

We have several things that were held back by API continuity/the issue of another major version bump:

Revisiting return codes for key exchange #102 with ever-more issues with cofactors coming to light and notions of acceptable API changing
Ristretto #176
Possibly wrong context type for the incremental EdDSA init functions #212
The goals and sense of crypto_key_exchange() #230
Post-quantum #226

A backport incidentally becomes impossible; the website needs to rectify its promises about backporting fixes to limit itself to API-compatible fixes... retroactively. That is fairly awful, but also kind of comes with the definition of SemVer.

Partly, this is an extension of Monocypher's identity crisis with embedded. As far as I've been able to tell, Monocypher's deployment model largely makes sense in embedded; its use of 64-bit arithmetic on the other hand is an issue (see also https://infoscience.epfl.ch/record/223794 for how compiler builtins might sabotage otherwise constant-time code). You generally want to optimize for code size and stack size at the expense of a lot of other things; the smaller, the more critical this gets. If we brush embedded aside, fix 1 is obviously the right thing to do.

As a more radical idea (and I'm aware that somebody has to actually do the work) for the long term: With the NIST Lightweight Cryptography competition being expected to coming to a close later this year and the post-quantum cryptography competition having selected a key encapsulation mechanism (though not definitively as patent details still need to be hashed out, possibly causing a switch in the portfolio in 2023), it might actually make sense to revisit Monocypher in three branches:

Monocypher-compat: The current version of Monocypher, no more improvements or major version bumps
PostQcypher: Monocypher 4.0 plus swapping X25519 for a PQ KEM and EdDSA for a PQ signature scheme and giving up on Elligator per se
Embedocypher: Monocypher 4.0 plus swapping the field and scalar math for Daniel Beer's C25519; PQ is a different question in these scenarios and may need revisiting and talks with embedded stakeholders, especially in light of NIST's new short signature with fast verification competition

Thanks for your thoughts, they do clarify a few things for me.

Fix1 is a deal breaker because of embedded. I also agree that we don’t want yet another compilation flag.
Fix2 is better, but it won’t be perfect, and backward compatibility will require us to keep the unsafe API for a while anyway.
Fix3 is a deal breaker because it breaks the test vectors, but I can vouch for its safety.

Fix3 is safer than you think. First, it doesn’t change the H(R||A||M) part, it only changes how r (and therefore R) is generated: instead of H(H2(seed) || message), we’d use H(H2(seed) || A || message). That way changing the public key renews the nonce exactly as if we changed the message itself. And it’s compatible with regular verification of course. It’s kind of a pity DJB et al. didn’t do that from the start, but since they had Fix2 in mind this was not needed.

You sum up Monocypher’s API problem quite well. Took me way too long to realise it, but it is indeed a low-level library, and I believe it should stay that way. With that in mind, the unsafe API is the only one that enables the two main use cases (storing sk and pk as an indivisible pair and regenerating pk to save space).

As for the radical idea of maintaining several different versions of Monocypher, I think I’ll wait at least 5 years for the dust to settle on post quantum stuff. As for avoiding misuse, I think this is a lost battle. The only really safe APIs out there are those who expose high-level use cases and handle all the system stuff we need. Here’s an encrypted socket, go read and write to it. Better leave that to higher-level libraries I will pinky promise write some time in the future.

In conclusion, I think I’ll start by updating the manual to warn users and explicit the two safe use cases.

I would recommend fix 2 given that it seems to be the most common approach. If you word things in the manual correctly, there's a very small risk of someone actually chopping the key in half. You could either make it seem like the entire private key is 64 bytes long or make a note of how the public key is included for this reason.

As for post-quantum, I don't think you need to immediately start switching over either given the competition hasn't really finished. Maybe wait until after round 4 and/or the additional call for signature schemes in case that time reveals anything important. However, I would say before 5 years.

Also, it probably doesn't make sense to stick with DJB only stuff since Classic McEliece has large keys, SPHINCS+ has larger signatures, and NTRU Prime wasn't selected because NIST wasn't convinced by the claimed ring security benefit, so there might be questions if that was chosen.

I'm interested to know when Frank will add post-quantum algorithms to libsodium; it would probably be sensible to time it similarly. In the meantime, I think a pre-shared key alongside X25519 like WireGuard supports should be recommended when possible if post-quantum security is a goal for key exchange.

I would recommend fix 2 given that it seems to be the most common approach.

It's also the only safe one that's not an instant deal breaker in many important use cases. My main problem here is that being pathologically tight in memory is also a legitimate use case, so I'd like to keep the ability to regenerate the public key from the seed. And we go back to square one: the simplest API that support both use cases is what we have now.

For now I have updated the documentation to warn users about this problem. We could go a little further and actually provide something like Fix2. The first step would be to provide the following:

void crypto_sign_key_pair(uint8_t secret_key[64],
                          uint8_t public_key[32]
                          uint8_t seed[32]);

void crypto_sign_safe(uint8_t       signature[64],
                      const uint8_t secret_key[64],
                      const uint8_t *message, size_t message_size);

That would be the new high-level API we'd advise everyone to use. As for crypto_sign_public_key() and crypto_sign(), they can be relegated to the "advanced" API, for those few users who can't afford 64 byte private keys. I hate having to use the _safe suffix, but this is the price of backward compatibility.

doc ok fscoto, though I'm not 100% sold on introducing "we" back into the manual.

I, too, hate having to use the _safe suffix, but this is indeed the price of backward compatibility.

"we" back into the manual.

Sorry, that was not on purpose. Feel free to correct it in a PR if you want, otherwise I can rephrase it out.

OK, I've written a quick (untested) implementation for the safe signing API. Here's what we need:

void crypto_sign_key_pair(u8 secret_key[64], u8 public_key[32], u8 seed[32])
{
    // Tolerate overlap between seed and the keys.
    u8 sk[32];
    COPY(sk, seed, 32);
    crypto_wipe(seed, 32);
    COPY(secret_key, sk, 32);

    crypto_sign_public_key(secret_key + 32, secret_key);
    COPY(public_key, secret_key + 32, 32);
}

void crypto_sign_safe(u8 signature [64], const u8  secret_key[64],
                      const u8 *message, size_t message_size)
{
    crypto_sign(signature, secret_key, secret_key + 32, message, message_size);
}

14 lines of code already, and we're talking about the bare minimum. If we want a safe API across the board we need to address the incremental and custom hash APIs:

void crypto_sign_init_first_pass_custom_hash_safe(crypto_sign_ctx_abstract *ctx,
                                                  const u8 secret_key[64],
                                                  const crypto_sign_vtable*hash)
{
    crypto_sign_init_first_pass_custom_hash(ctx, secret_key, secret_key + 32,
                                            hash);
}

void crypto_sign_init_first_pass_safe(crypto_sign_ctx_abstract *ctx,
                                      const u8 secret_key[64])
{
    crypto_sign_init_first_pass_custom_hash_safe(ctx, secret_key,
                                                 &crypto_blake2b_vtable);
}

13 more lines of code (for a total of 27 now), and function names that are starting to get really long. We could possibly skip that second part because the incremental and custom hash interfaces are already advanced and specialised, but this would be at the cost of orthogonality.

So for this fix we're looking at 4 additional functions, costing 27 lines of code, and I haven't counted Ed25519 in the optional file (I'd bet on 3 more functions and 16 lines of code or so). And that's on top of leaving behind the unsafe API very few people will ever use. To be honest, the following alternatives start to be more and more tempting:

Break the API and increase the major number version. Take the opportunity to revise past decisions about the rest of the API. Weep about having to maintain 4 branches.
Pretend documenting the issue was enough. This is a C library, users who don't read the documentation are asking for it. Weep about smartarses who don't read the documentation.

(Edit: it just occurred to me that we could add the first two functions as examples in the manual, so users who want it know how to make a safe API on top of the unsafe one.)

LoupVaillant / Monocypher