LoupVaillant / Monocypher

An easy to use, easy to deploy crypto library
https://monocypher.org
Other
614 stars 80 forks source link

Integrated documentation generation #250

Closed ghost closed 1 year ago

ghost commented 1 year ago

Have you considered using something like Doxygen for a more integrated documentation generation?

LoupVaillant commented 1 year ago

@fscoto and I have given quite a bit of thought, and as far as I can tell, Doxygen falls far short of our needs. Which are the web site and man pages. Doxygen could help with the first, but I believe is not suited for the second.

Moreover, most functions are grouped into clusters, where their arguments have the same meaning and purpose across the entire cluster. See hashes or encryption functions for instance, where the same argument appears several times. It is better in this case to document the arguments of the cluster once for the entire cluster, instead of repeating it for each function.

In any case, the current way of documenting the API is pretty focused: we have one file per function cluster, where we document everything about them. It's not on the same file for sure, but that's hardly an issue in my opinion. And there's an advantage to it too: perfect mapping between the source material and the target man pages.

Now, Monocypher could use other kind of documentation: examples, tutorials, how-tos… I never got around to write them, and they definitely don't belong to a system like Doxygen anyway. Plus, many such examples and how-tos would actually warrant fully fledged higher level programs or libraries, to do things like authenticated key exchange, file encryption, network protocols… Because let's face it, Monocypher (all NaCl libraries including Libsodium in fact), remains a fairly low level library that in many cases requires non-trivial cryptographic knowledge to use safely. Those who have that knowledge don't need such tutorials, and those who don't could really benefit from higher-level constructs. Which are coming. Pinky promise.


Now, just to make sure I didn't miss anything, what do you think Doxygen would do better than the current approach? More generally, what do you care about in documentation, and how the current approach falls short of your ideal? You suggested a more "integrated" approach, but perhaps you can be a bit more specific?

ghost commented 1 year ago

More integrated into the development workflow.

It would be easier to document, the documentation would always be up to date, and, guaranteed coverage.

LoupVaillant commented 1 year ago

Yeah, those are problems indeed. Minor problems in my opinion, but there's probably room for improvement.

I don't like Troff. To me it's ugly and not very readable, unlike something like Markdown. It is however precise and easy to parse. I love the man pages that come out of it, existing HTML generation tools work well enough for me, and writing automatic checking tools for it shouldn't be too hard.

Now while I don't like the idea of writing everything directly in the header, I remain open to the idea of referencing the documentation from the header. We could organise the header into separate sections, and each section would have its own man page. Then each function in the section should be represented as is in the synopsis, no other function name should be present there, and we should check that cross references aren't dead. And while we're at it symbolic links may be removed from the source tree and generated at installation time instead.

That should do the trick. And now this issue is definitely staying open…

LoupVaillant commented 1 year ago

I have just finished normalising the names in the API. Now every function begin by crypto_<something>, giving it a clear place in the whole library. Right now we have 10 groups:

crypto_verify*
crypto_wipe
crypto_chacha20*
crypto_poly1305*
crypto_aead*
crypto_blake2b*
crypto_argon2*
crypto_x25519*
crypto_eddsa*
crypto_elligator*

_(The conversion functions between EdDSA and X25519 don't have a clear group, so I've chosen to name them such that one would go in crypto_x25519 and the other in crypto_eddsa.)_

The optional section adds 3:

crypto_sha512*
crypto_hmac_sha512*
crypto_ed25519*

This suggests we should have at most 13 man pages, plus the introduction. Less if we end up fusing SHA-512 and HMAC, or crypto_verify*() and crypto_wipe() (not sure this is such a bright idea though).

I'm bringing this up because in addition to the lack of automation, we have too many man pages, and the separation between the "basic", "optional" and "advanced" stuff make some functionality harder to discover. My idea here is to accept that Monocypher is too low-level to be safely used by beginners in many contexts. Authenticated encryption when we already have a key, hashing, password hashing and signatures are easy enough, but everything else (key exchange, PURB, PAKE, niche EdDSA uses…) are loaded foot-guns that need higher-level kevlar shoes — I mean APIs.

So how about bringing down the walls, and have everything in a flat hierarchy? Just 14 pages (introduction then one per section). We can still push the foot guns to the bottom of each page, but at least users will know from the synopses that they have options.

@fscoto, how would you feel about that?