LoupVaillant / Monocypher

An easy to use, easy to deploy crypto library
https://monocypher.org
Other
594 stars 79 forks source link

Allow API to be put in a C++ namespace #225

Closed snej closed 2 years ago

snej commented 2 years ago

Minimal change to permit Monocypher to be compiled as C++ and wrapped in a namespace, to prevent symbol name collisions (#224). I have verified that this works, in a branch of my monocypher-cpp project.

If the preprocessor symbol MONOCYPHER_CPP_NAMESPACE is defined, the Monocypher headers will be wrapped in a C++ namespace instead of in extern "C". The name of the namespace is the value of the symbol.

Using this feature requires also compiling monocypher.c as C++ in the same namespace, for instance by using a C++ source file wrapping the C one:

// monocypher.cpp
#define MONOCYPHER_CPP_NAMESPACE monocypher
namespace MONOCYPHER_CPP_NAMESPACE {
#  include "monocypher.c"
}

(Such a source file could be added here as monocypher.cpp, but I was going for the minimal-size change.)

fscoto commented 2 years ago

So the issue this is trying to address is symbol name collisions, namely in your own code. Which is justifiable a motivation indeed, but generalization of the solution for everybody else is another matter entirely. Part of Monocypher's design goals as I understand them is to make it easy to apply exactly these kinds of hacks in the first place. I would thus like to first re-frame the issue as to take it from “minimal effort to fix your issue” to “this warrants changing upstream” since I think it is relevant all things considereed.

Because Monocypher advertises that it compiles as C99 and C++, it appears to me that a reasonable expectation of the general public would be that Monocypher conforms to a minimum acceptable standard of behavior as a citizen of C++ library space.

I am now speculating as a non-C++ developer about the nature of the C++ library ecosystem. Part thereof is being namespaced. Because a new namespace would break under semantic versioning if introduced now (causing Monocypher 4.0.0 or perhaps a rename entirely because every symbol just goes away), it's being hidden behind a conditional ifdef.

Personally, I'm not opposed to the ifdef. I don't like ifdefs per se because they're added complexity. In this case, the complexity seems to me obvious and trivial; I cannot see a way that this would require actual maintenance unless the core nature of Monocypher itself changes.

What I'm not sold on is the extra MONOCYPHER_CPP_NAMESPACE layer. If we accept C++ support to be necessary (which it is by means of semantic versioning) and proper C++ support to require namespacing (which is pure speculation on my part), then it should also be an upstream decision to actually define the namespace. In this context, I would like to note that Crypto++ hard-codes its namespace CryptoPP (with fancy macros because of Doxygen, not because of a fear they'd need to change it). Unfortunately, we must also consider that existing downstreams may already have done exactly this and wrapped Monocypher in a C++ namespace on their own. If we name ours monocypher or Monocypher, we may collide with that instead and thus break downstreams that way instead.

With all these things in mind and marvelling at @snej's foresight that I would consider all of this, 9ae7e55f850f5864cbbb498f662bb50b333d8b2b seems like the ideal way out. However, since it's a namespace, shouldn't it go in monocypher.c as well? I'm not sold on making people create a separate monocypher.cpp since monocypher.c is intended to compile as C++ as well. A C++ implementation is free to compile .c as C++ or not (and indeed Clang has deprecated treating .c input as C++). In that case, the wrapper monocypher.cpp may still be created by downstream consumers of Monocypher (or they can just rename monocypher.c to monocypher.cpp in deployments that don't plan on unpacking another Monocypher tree until there's a security vulnerability).

As @LoupVaillant correctly notes, this means deprecated/ needs a brush-up, too.

The naming of the MONOCYPHER_CPP_NAMESPACE definition is not defined by technical necessity but rather a more or less entirely free, creative naming space. This not being legal advice, I can see the free-form naming of a single preprocessor directive to be copyrightable. I kindly ask that you update the written by/copyright notices where necessary as measure of an abundance of caution.

SemVer impact: This needs a minor version bump, i.e. next version must be Monocypher 3.2.0. Please make a mental note for releasing the next version if this is adopted, @LoupVaillant.

Documentation impact: This probably goes in the README as it is not directly related to usage of the code itself but rather environment setup. It may be worth repeating in intro(3monocypher).


Unrelated to that is the incidentally created issue of vendoring the directory tree unmodified. Vendoring Monocypher as a sub-tree instead of a collection of individual side-by-side files is not the intended deployment mode either as individual file(s) as part of compilation of a project or as a compiled shared object/dynamically linked libary.

If you're in a place to vendor Monocypher as a sub-tree and update it every release, and I note once more that Monocypher outside of tagged releases must not be used in production deployments, you're probably also in a place to add more -I flags and paths for the objects created by compilation of deprecated/ or optional/.

Unlike the namespace, I currently fail to see how Monocypher is bound to change the paths as intended in commit 9817b911790e004b91f525a9f3018560efc96432. If anything, that breaks existing automated deployments of new releases by changing the path and would thus incur a Monocypher 4.0.0. Consequently, I would like to suggest not merging this commit.

LoupVaillant commented 2 years ago

Thanks a ton @fscoto.

OK, so I guess. the ../ in the include path will have to go.

However, since it's a namespace, shouldn't it go in monocypher.c as well?

Crap, I think you're correct. That means:

#ifdef MONOCYPHER_CPP_NAMESPACE
namespace MONOCYPHER_CPP_NAMESPACE {
#endif

// tons & tons of code

#ifdef MONOCYPHER_CPP_NAMESPACE
}
#endif

That costs 6 lines of code for what I believe is a very niche use case, that's no good. I really want to keep the SLOC count below 2K lines (it's an important psychological threshold), and I'm very seriously considering adding non-trivial functionality. Every line spent is a line that will be very hard to get back (because I really really don't want to bump that major version number again).


Still, how niche is that use case exactly? On the one hand I personally consider Monocypher to be primarily a C library, but ultimately that's not my call to make, it really depends on how people actually use it. @snej, I would like your opinion on this:

snej commented 2 years ago

I myself am uncomfortable about having to manually edit dependencies. When I update a dependency I prefer to be able to simply update a submodule; this makes it clearer what exact code I'm using and eliminates the possibility of mistakes.

I'm aware other people think differently. If you want to prioritize their approach and avoid uglifying the source at all, that's your choice and I'll abide by it.

Also, I don't see 2000 lines as having any real importance; what does a line mean in C anyway? Line counts are so easy to fudge by changing the brace style, removing comments and so on. Code size is more significant. But again, it's your program and your decision to make.

snej commented 2 years ago

How comfortable would you be with taking the latest release, and adding the namespace manually? Do you think it is easy to sell, or do you expect other people to criticise you for "modifying" a cryptographic library?

I'll do it if I have to. I'll just have to add a statement like "I promise this is exactly the same source code as Monocypher release x.x.x, except that I've added some namespace declarations." Anyone sufficiently paranoid can diff the files against your copies.

LoupVaillant commented 2 years ago

I myself am uncomfortable about having to manually edit dependencies.

That's understandable. Having to tell your users "pinky promise I only modified this unimportant bit" is a bit of a hassle.

When I update a dependency I prefer to be able to simply update a submodule

Wait, that raises a red flag of mine. Monocypher is originally intended to be used in two main ways:

Though it is fine to depend on a tagged commit that generated an official release, you really really don't want to depend on the latest commit from master, that's unstable, not as thoroughly tested as the official releases, and I cannot vouch for it.

That being said, you're not the first one to want to use submodules, and for that I already have an item on my TODO list: set up a monocypher-releases repository whose main branch only contains the contents of the tarball for the latest releases. That ought to make submodule updates simpler.

I don't see 2000 lines as having any real importance; what does a line mean in C anyway? Line counts are so easy to fudge by changing the brace style, removing comments and so on. Code size is more significant.

It's a self imposed, psychological and marketing threshold. It's not a hard limit, but it keeps feature creep in check. As for the way I count lines, I'm actually very strict. First, I use the sloccount program, which strips comments and empty lines (I'm very generous with my comments). Second, I have a hard 80 columns limit, so I don't cheat by having very long lines — not even for pre-computed tables. Third, I maintain a very consistent coding style. Overall, I don't think I can reduce the line count any more without cheating.

Currently, monocypher.c stands at 1887 SLOC. That means we have 113 lines left to add stuff. But, as @fscoto once wisely pointed out to me, it's not really a feature budget. It's more of a crisis budget, to use for stuff we really need. And my question is, "is that namespace worth spending 6 of those 113 lines?".

You know what, I think the answer is actually "yes".

OK you convinced me. Monocypher can compile to C++, we might as well go all the way. So:

snej commented 2 years ago

Though it is fine to depend on a tagged commit that generated an official release, you really really don't want to depend on the latest commit from master, that's unstable, not as thoroughly tested as the official releases, and I cannot vouch for it.

I understand about the master branch. A submodule just points to some commit ID in a repo, not necessarily on the master branch (or any branch). When I update my Monocypher submodule I check out the latest release tag.

I’ll get to your suggested changes ASAP, hopefully tomorrow (Monday).

snej commented 2 years ago

Done! I added myself to AUTHORS.md, but not to copyright notices since my contribution seems too small to put my name up next to yours 😔. But if you feel strongly about that, I can update those too.

LoupVaillant commented 2 years ago

Looks good to me, thanks. Merged now.

I personally don’t feel too strongly about the authorship (Edit: because as you said it’s a fairly small contribution). It’s more of a legal thing, so I’d rather allow @fscoto to overrule us on this one.