API documentation - Githubissues

jgm commented 9 years ago

We should have a plan for generating API documentation. Would it make sense to use doxygen? I haven't used it myself, but it purports to generate HTML and man page documentation from comments in the source.

jgm commented 9 years ago

I've added the doxygen infrastructure to the Makefile, etc. We just need some comments in the source.

jgm commented 9 years ago

I now think doxygen is too much trouble for our simple needs. I'll probably just write a short script to extract documentation for the library and build a man page using pandoc (which also makes nicer man pages than doxygen).

nwellnhof commented 9 years ago

Note that now the build fails if pandoc is not installed:

[100%] Generating man1/cmark.1
/bin/sh: 1: pandoc: not found
man/CMakeFiles/doc.dir/build.make:51: recipe for target 'man/man1/cmark.1' failed

jgm commented 9 years ago

+++ Nick Wellnhofer [Nov 29 14 13:04 ]:

Note that now the build fails if pandoc is not installed:

Yes, this isn't great. We could go back to the old method of including the man page in the repository, but I have half a mind to hack up a little man page writer using cmark itself.

(Eventually I'd also like to use cmark to generate spec.html.)

jgm commented 9 years ago

It shouldn't be too hard to rig up a script that creates man/cmark.3.md from comments in the source of cmark.h, and that's probably a good idea, to ensure consistency.

nwellnhof commented 9 years ago

Nice. We use a very similar documentation system for Apache Clownfish. Our goal is to output Perl POD, man pages, HTML, and other language-specific formats. We're about to switch to Markdown, that's how I became interested in libcmark. For example, the documentation in this "Clownfish header file" wil be converted to HTML and POD.

jgm commented 9 years ago

Nice to know how you got interested! So, I guess we might both interested in a man renderer for libcmark. (Putting raw man macros in the cmark.h documentation isn't so nice!) Would you like to collaborate on that? It should be pretty easy, since we already have the HTML renderer to use as a framework.

nwellnhof commented 9 years ago

FWIW, here's my rudimentary implementation for Apache Clownfish (Github link with syntax highlighting). It's a simple recursive algorithm tailored to our specific needs. But the handling of nested lists and blockquotes should be sound. I found this to be the most tricky part.

A few other things aren't implemented because we won't need them for now. Among them are constructs that don't map very well to man pages like headers in lists.

jgm commented 9 years ago

I've put together a lua wrapper for the API, and I've been experimenting with a standalone executable that allows you to define your own writers using lua. Here's a sample man writer: https://github.com/jgm/cmark-lua/blob/master/man.lua Using a similar approach one can write filters in lua that transform the AST. I'm thinking this is a nice way to make the API accessible to people who don't want to deal with C.

jgm commented 9 years ago

@nwellnhof - I added cmark_walk to the API. This walks the AST, applying a function to each node and optionally updating state. Putting all the tree-walking mechanics into this function allowed me to drastically simplify the HTML renderer, which went from 357 lines to 280, and is now much easier to understand. It is also a bit faster in my benchmark. I also removed CMARK_NODE_REFERENCE_DEF, as there seemed to be no purpose to keeping these empty nodes in the AST.

nwellnhof commented 9 years ago

Nice. For the public API, I would only add #defines or an enum for the callback reason. Why not something like

typedef enum {
    CMARK_EVENT_ENTER,
    CMARK_EVENT_EXIT,
    CMARK_EVENT_DONE
} cmark_event_type;

and then

typedef int
(*cmark_event_handler)(cmark_node *node, cmark_event_type ev_type, void *state);

It's not necessary to specify parameter names in a function typedef but it is good for documentation.

I also see no reason for S_is_leaf_node. We could simply check whether node->first_child is NULL. Leaf nodes are already guaranteed to never have children.

Another nice feature would be an iterator-based API. Something like the following:

typedef struct cmark_iter cmark_iter;

cmark_iter*
cmark_iter_new(cmark_node *root);

void
cmark_iter_free(cmark_iter *iter);

cmark_event_type
cmark_iter_next(cmark_iter *iter);

cmark_node*
cmark_iter_get_node(cmark_iter *iter);

void
usage_example(cmark_node *root) {
    cmark_event_type ev_type;
    cmark_iter *iter = cmark_iter_new(root);

    while ((ev_type = cmark_iter_next(iter)) != CMARK_EVENT_DONE) {
        cmark_node *cur = cmark_iter_get_node(iter);
        // Do something with `cur` and `ev_type`
    }

    cmark_iter_destroy(iter);
}

With such an approach, there's no need for a separate callback function. Callbacks are cumbersome in C, especially when having to pass context data.

jgm commented 9 years ago

I also see no reason for S_is_leaf_node. We could simply check whether node->first_child is NULL. Leaf nodes are already guaranteed to never have children.

The problem is that container nodes may lack children, but we still want to trigger an "ENTER" and "EXIT" callback. (E.g. in an empty list item we might want the ENTER callback to emit <li> and the EXIT callback to emit </li>.) That's why I think this is necessary.

I like the iterator idea, and I agree that maybe it's better than walk. Do you think it would make sense to provide both, or would it be simpler just to provide the iterator API?

jgm commented 9 years ago

I've added the iterator interface with the API you suggested, and removed cmark_walk. Seems to work well.

jgm commented 9 years ago

I've also added a man page writer (getting back to the original topic of this thread).

commonmark / commonmark-spec

API documentation #224