DanBloomberg / leptonica

Leptonica is an open source library containing software that is broadly useful for image processing and image analysis applications. The official github repository for Leptonica is: danbloomberg/leptonica. See leptonica.org for more documentation.
Other
1.8k stars 393 forks source link

Documentation online #126

Closed zdenop closed 8 years ago

zdenop commented 8 years ago

Can you please publish recent documentation of leptonica? Something like https://tpgit.github.io/UnOfficialLeptDocs/leptonica/ or https://tpgit.github.io/UnOfficialLeptDocs/leptonica/functions.html or doxygen generated (like tesseract-ocr.github.io)

DanBloomberg commented 8 years ago

We don't have a recent update to the UnOfficial documentation.

However, documentation continues to be updated at leptonica.org.

-- Dan

On Sun, Feb 21, 2016 at 11:59 PM, zdenop notifications@github.com wrote:

Can you please publish recent documentation of leptonica? Something like https://tpgit.github.io/UnOfficialLeptDocs/leptonica/ or https://tpgit.github.io/UnOfficialLeptDocs/leptonica/functions.html or doxygen generated (like tesseract-ocr.github.io)

— Reply to this email directly or view it on GitHub https://github.com/DanBloomberg/leptonica/issues/126.

pullmoll commented 8 years ago

I already thought about giving a try to change the existing documentation blocks for all the functions to the doxygen format. This would make it very much easier to keep some kind of online documentation up to date. For example adaptmap.c function pixCleanBackgroundToWhite comment block would look like this:

/**
 * @brief Clean background to white using background normalization
 *
 * @param pixs 8 bpp grayscale or 32 bpp rgb
 * @param pixim <optional> 1 bpp 'image' mask; can be null
 * @param pixg <optional> 8 bpp grayscale version; can be null
 * @param gamma gamma correction; must be > 0.0; typically ~1.0
 * @param blackval dark value to set to black (0)
 * @param whiteval light value to set to white (255)
 * @return pixd (8 bpp or 32 bpp rgb), or null on error
 *
 * <pre>Notes:
 *    (1) This is a simplified interface for cleaning an image.
 *        For comparison, see pixAdaptThresholdToBinaryGen().
 *    (2) The suggested default values for the input parameters are:
 *          gamma:    1.0  (reduce this to increase the contrast; e.g.,
 *                          for light text)
 *          blackval   70  (a bit more than 60)
 *          whiteval  190  (a bit less than 200)
 * </pre>
 */
DanBloomberg commented 8 years ago

Jürgen,

What is the simplest change that can reasonably be made to augment the current doxygen formatting?

At present, nearly every function has a text header /! .... /

Is there any reason to add anything?

With respect to the comments at the top of each file, these do two things: (1) list the functions in the file (2) [sometimes] have other general comments about usage

These are presently not 'doxygenated'.

Would you suggest using the following to include them as well:

/! \file .... /

Should there be any special formatting for the listing of the functions?

Perhaps you can take a file, say, boxbasic.c (which has a listing of functions and a paragraph on usage) and decorate it, as an example of what you are thinking about?

-- Dan

On Fri, Apr 8, 2016 at 8:47 AM, Jürgen Buchmüller notifications@github.com wrote:

I already thought about giving a try to change the existing documentation blocks for all the functions to the doxygen http://www.stack.nl/%7Edimitri/doxygen/ format. This would make it very much easier to keep some online documentation up to date.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/DanBloomberg/leptonica/issues/126#issuecomment-207488688

pullmoll commented 8 years ago

I hacked up a comment format converter which should do most of the tedious part: https://github.com/pullmoll/leptonica/tree/doxygen

The source file is conv2doxy.c and you can compile (on *nix) with cc -o conv2doxy conv2doxy.c

If you run it for e.g. src/adaptmap.c, it will create a backup src/adaptmap.c~ on the first run, or use that backup on later runs, and write src/adaptmap.c re-formatted for doxygen. To handle all files at once I did ls src/*.c | xargs ./conv2doxy.

In the header files I manually added doxygen /*! and \file where appropriate, as well as /*!< ... */ for comments after struct fields and enum values.

The conversion of the file block comment into a doxygen comment /*! and insertion of \file before the basename relies on the seemingly or almost fixed format of these comment blocks.

A TODO after running this conversion is to write meaningful @brief statements instead of the function's name (2601 functions on last count). This could perhaps be automated, if there was some kind of lookup table like <functionname> <comment> read from an external file.

It seems that in many cases the actual brief description of a function is given in a block comment:

/*----------------------------------------------------------------------*
 *              Accumulator for 1, 8 and 32 bpp convolution             *
 *----------------------------------------------------------------------*/

I'll perhaps try to collect these comments and create a lookup file.

The conversion fails for some format deviances like:

The Doxygen output of the current state after conversion can be seen here.

The current warnings output of doxygen can be seen here. Besides logging of unknown commands like \somestring in the block comments, it also lists a lot of mismatches between parameter descriptions and actual parameters of the functions.

DanBloomberg commented 8 years ago

Wow. That was/is a big job! I really appreciate your effort here.

I have only looked at a few files. I like to use block headers such as

/-------------------------------------------------------------------------

but I see that in regutils.h, it was replaced by

/! Regression test parameter packer /

Perhaps we can do:

/-------------------------------------------------------------------------/ /! Regression test parameter packer */ /-------------------------------------------------------------------------*/

I hope you're not suggesting 2600 TODOs to write "brief" descriptions of each function. That just can't happen -- life is too short :-)
Basically, if I believe the function and/or parameters need to be further explained, I add the Notes: section. Otherwise, the one-liners following the input args suffice.

What do you suggest at this time? That we pull the entire change as is (maybe with some modificatoins as mentioned above?)

DanBloomberg commented 8 years ago

Jurgen, how do I pull your changes into a local copy, rather than into the master? (I'm just a novice with git/github).

-- Dan

On Sat, Apr 9, 2016 at 11:25 AM, Jürgen Buchmüller <notifications@github.com

wrote:

I hacked up a comment format converter which should do most of the tedious part: https://github.com/pullmoll/leptonica/tree/doxygen

What's missing is the detection of the file block comments and insertion of \file in order to make doxygen actually list the files and their functions in its index.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/DanBloomberg/leptonica/issues/126#issuecomment-207827293

pullmoll commented 8 years ago

Dan, you can perhaps create a branch on your side with git checkout -b doxygen and then git pull git@github.com:/pullmoll/leptonica -b doxygen. If that fails, the git pull --help should tell how to do it right :)

Afterwards you can switch between branches with git checkout master and git checkout doxygen.

To avoid catastrophic things from happening, you could also work on a new clone of leptonica in another path. There you could try to compile conv2doxy.c, run it yourself and get a glimpse at how the src/*.c files look afterwards.

I'd first have to create a PR from my branch to make it easier. I would not yet create it, because I think I can find some more bugs and make some more improvements to the converter.

I will reset the header files to the original and modify them in a way that the blocks are kept.

For the TODOs I wanted to try to collect the previous block /*--…--* comment before a function and write that to the lookup file. I don't know how many useful entries this might generate.

pullmoll commented 8 years ago

I rewrote the header files without removing the

/*******************************
 *      Block comments         *
 *******************************/

The tagging for doxygen with *! seems to work as can be seen here.

There are some more block comments which contain e.g. variable names in angle brackets <p> which are, unfortunately, taken as HTML tags by doxygen. I'm not sure what would be the best replacement for this notation. Perhaps [p] or {p}? ... but

I now made conv2doxy replace < with \<, > with \> and & with \& inside block comments, which are already detected as Notes: sections in the state machine. This avoids erroneous interpretation as HTML tags.

An alternative is to wrap all the code examples and formulas in \code and \endcode, which also disables evaluation of HTML tags. This would have to be done manually, though.

In theory formulas could even be LaTeX'd and wrapped in \f[ and \f] as described here.

pullmoll commented 8 years ago

I'm still working on improving the results. The current patchset (before conv2doxy is run) can be inspected here. Once I'm satisfied with the results, I'll actually open up the PR.

I'm trying hard not to change too much, but just what's required to allow a smooth transition by running conv2doxy at some point.

DanBloomberg commented 8 years ago

Looks good! Agree -- less is better, unless it's too little :-)

I am thinking that once this is set up, we could run your converter once with each new version (1.73, ...). Does that seem reasonable?

Is it too much work to add the ! \file before the list of functions at the top of each .c file?

Actually, if you get this in shape there's no reason for me to integrate and edit it before pushing to the master.

-- Dan

On Mon, Apr 11, 2016 at 2:50 AM, Jürgen Buchmüller <notifications@github.com

wrote:

I'm still working on improving the results. The current patchset (before conv2doxy is run) can be inspected here https://github.com/DanBloomberg/leptonica/compare/master...pullmoll:doxygen?expand=1. Once I'm satisfied with the results, I'll actually open up the PR.

I'm trying hard not to change too much, but just what's required to allow a smooth transition by running conv2doxy at some point.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/DanBloomberg/leptonica/issues/126#issuecomment-208261922

pullmoll commented 8 years ago

I think that perhaps you did not look at the files after conv2doxy was run, as they are not online. In my pull request there are just format, alignment and missing/extra pieces fixes to the src files before conv2doxy is run.

So, NOOO! :) You would run the converter exactly once at some point and afterwards check in the converted src/*.c files. In the future, if new functions are added, you would use the doxygen comment style yourself. That makes it much easier for contributors IMO, because doxygen is very well documented (who'da thought) and following its rules is simple.

Also everyone with a github account can then help fixing or improving the documentation through pull requests without the need to re-run some other tool but doxygen. Perhaps the Makefile.am can be modified to add a target doc which in turn runs doxygen.

The /*! \file ... lines are in fact inserted; that's part of the one-time conversion process.

I created the PR now, because my recent changes were all just fixes to e.g. mismatches between parameter names in the comment vs. the function header, or some kind of copy+paste errors. This kind of oddities are documented in the doxygen.log output on every run and can be fixed any time after the conversion, by anyone who finds time to look at the issues listed there and creates a PR.

In some source or header file we should add a \mainpage comment section:

/*! \mainpage Leptonica Main Page
 *
 * \section intro Introduction
 * Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
 * tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua.
 * ...
 * 
 * \section building Building Leptonica
 * Blah blah blah...
 *
 * \subsection unix Building on Unix
 * <Paste text from somewhere?> ...
 *
 * \subsection windows Building on Windows
 * <Paste text from somewhere?> ...
 * 
 * \section contributing Contributing to Leptonica
 * Blah blah blah...
 */

Of course this is just an example. The \mainpage text would then appear on the front page index.html of the doxygen output, which is currently empty.

DanBloomberg commented 8 years ago

You are right -- I haven't looked at the files after conv2doxy. But I do want to (see below).

Question: did you insert the /*! \file labels by hand on all the files? That's not part of conv2doxy?

I haven't been able to test your doxygen branch. I did 'git checkout -b doxygen', but the 'git pull git@github.com:/pullmoll/leptonica' doesn't do anything (it doesn't include your patch), and appending the '-b doxygen' isn't legal. I also tried a git pull on the URL of your leptonica directory, but that gave the same result -- I didn't get the doxygen branch. And I couldn't figure out what to do from 'git pull --help'.

I just want to download your patch and play with it, run your conv2doxy on a few files, etc. This seems like a basic thing to do (e.g., before merging with the master on a big change). Any more suggestions on how I can recreate your doxygen branch locally?

-- Dan

On Mon, Apr 11, 2016 at 11:55 PM, Jürgen Buchmüller < notifications@github.com> wrote:

I think that perhaps you did not look at the files after conv2doxy was run, as they are not online. In my pull request there are just format, alignment and missing/extra pieces fixes to the src files before conv2doxy is run.

So, NOOO! :) You would run the converter exactly once at some point and afterwards check in the converted src/*.c files. In the future, if new functions are added, you would use the doxygen comment style yourself. That makes it much easier for contributors IMO, because doxygen is very well documented (who'da thought) and following its rules is simple.

Also everyone with a github account can then help fixing or improving the documentation through pull requests without the need to re-run some other tool but doxygen. Perhaps the Makefile.am can be modified to have a target doc which runs doxygen.

The /*! \file ... lines are inserted. That's part of the one-time conversion process.

I could create the PR now, because my recent changes were all just fixes to e.g. mismatches between parameter names in the comment vs. the function header, or some kind of copy+paste errors. They are all documented in the doxygen log output and can be fixed any time after the conversion.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/DanBloomberg/leptonica/issues/126#issuecomment-208737937