DECLT version 4.0: DITA ?

Symbolics commented 4 years ago

I didn't know quite where in the lisp community to put this. Perhaps reddit/common-lisp at some point. I'm not assuming that you would do this work, nor am I suggesting what course declt should take, but rather opened this issue here to solicit an informed opinion from someone who's done something similar.

I use declt because, of the common lisp documentation systems, it is the only one that compiles docstrings down to an intermediary form, and then post-processed into final output. This is the 'right way' to produce technical documentation. As much as I'd love for the entire tool chain to be written in common lisp, the reality we live in is that for high quality output, third party tools are required.

For a while now I've been considering how this might be done with dita. It seems ideal for API documentation, has a good tooling ecosystem around it, and several good editors. Output formats include web help, epub, PDF, JavaDoc, and others.

Could the declt code be the basis of such a doc string extraction tool? It seems like much of the machinery is already there, and instead of texi output, declt would write dita files. This not only has the advantage of better final documentation, once a dita map is created the API docs can be regenerated without disturbing the surrounding text (tutorials, intro, etc).

Ideally this tool chain could be added to some of the CI systems in use within the lisp community so that all projects would benefit, similar to the way they do now from declet in quickdocs.

Is this idea feasible using what's in declt now? How big of an effort is this? The doc problem is one that sorely needs to be solved to put the CL community on par with 'the others' (Python, Julia, etc)

didierverna commented 3 years ago

Hello,

I can't comment on DITA, as I'm hearing about it for the first time here. I can comment on the rest however.

There is an entry in Declt's TODO list about turning Texinfo into a mere backend (so yeah, that's something I'm planning on sorting out at some point, and in a hopefully not too distant future). The obstacles right now are the following.

Some of the user-provided material is assumed to be in Texinfo format already.
The intermediate representation contains a mix of things already rendered as Texinfo strings, and things that have yet to be rendered.
The intermediate representation closely follows the Texinfo files organization.

For (1), that is mostly the introduction and conclusion optional chapters, assuming Texinfo format means that this material can already contain markup and even cross-references to Texinfo nodes, other documents, etc., so full flexibility. If we remove that assumption, we either 1. lose the ability to have markup in that material, or 2. we need to agree on something else (e.g. Markdown) for basic typesetting, but we'd lose general cross-referencing anyway, or 3. we'd need to provide as many duplicate versions of the same material as we'd have output formats, which is not very satisfactory either.

For (2) and (3), what's needed is in fact 2 intermediate states instead of one. The first one would be restricted to the abstract collection and extraction of all documentation items (that's what you would use to output to another format). The second one would contain the re-organization of that material as an abstract Texinfo tree, to be rendered in a file later on. That distinction is in fact already almost here already. There is a CONTEXT structure containing the collected data, and a NODE structure for Texinfo abstract representation. It's just that IIRC, the membrane between the two is somewhat porous. For example, some pieces of information from the CONTEXT are only partially extracted, and the transition to a NODE does some extraction and some rendition at the same time (it's the generic function DOCUMENT which does that). This is epsecially true for ASDF components.

Finally, for (3), there is also the fact that the final organization of the material in whatever intermediate representation is not flat, but follows the way I like to see the manual organized (e.g. FOO and (SETF FOO) close to each other, etc). My own view of the organization of things may differ from someone else's.

To sum up, I don't see any showstopper in separating Texinfo from the rest more cleanly. It's just various amounts of work depending on the points detailed above. Note that also on my TODO list is to support different markups in docstrings. I've had a student working on a Markdown parser, but never got around to finishing it as of now...

snunez1 commented 3 years ago

Great, it seems like this has been considered before. I'd be happy to help prototype a DITA output from the intermediate state when things reach that point. It sounds like the transformation from first intermediate state to TexInfo tree would be a good model for a DITA transformation. XML might make a good choice for modelling the first state. Although it is wordy, transformations, tooling (editors, etc), lisp support and skills around XML are good. A request / consideration on docstring mark-up: it would be good to have semantic markup here; my biggest gripe about Markdown is that it's styling markup, and those two concerns (semantics & styling) should be separated.

Symbolics commented 3 years ago

Just thought I'd revisit this and see if there's any kind of preview of the new API generation code. I assume that the current work is aimed toward Markdown, and I'm in the process of building a Hugo based documentation site and would love to test the new declt API generation as part of the reference documentation.

didierverna commented 3 years ago

Coming soon.

didierverna commented 3 years ago

Hello,

I sincerely apologize for the long delay. My personal life has been pretty shitty this past year-and-a-half, and I had little time to devote to hacking.

Anyway, I'm progressing, slowly. I've just pushed to the trunk an extensive set of changes that go a long way towards finalizing step 1 of the big architecture revamp. Declt now starts by building an intermediate data structure called an "extract" before generating documentation. The extract contains various stuff, and in particular a list of all gathered definitions, in abstract form, and stuffed with cross-references. This part of Declt now lives in its own sub-system (also called "extract") so normally, anyone can call the extract function (as opposed to the global declt one) and create their own documentation in any form they'd like, based on that.

I have yet to update the documentation on that part, and the backward-incompatible API changes, and then I'll release an official 4.0 beta 1 version. After that, I'll start working on the generation part again. In the meantime, with this part stabilized, I feel more at ease to also work on the various, more specific, issues raised here and there.

Again, thanks for your patience.

Symbolics commented 3 years ago

Is there anything I can do to help?

I'm at the point now where I'm trying to automate documentation generation, and would love to understand the 'best practice' here. I tried various forms of uiop:run-program "makeinfo" '(..., but on MS Windows, at least MSYS2, makeinfo is a shell script and run-program doesn't seem capable of invoking it.

A shell script that invokes SBCL with a configuration like:

(require :asdf)
(defparameter +copyright-years+ "2021")
(asdf:load-system :net.didierverna.declt)
(net.didierverna.declt:nickname-package)

(declt:declt :dfio
         :library-name "Data Frame I/O"
         :copyright-years +copyright-years+
;        :license :ms-pl
         :declt-notice nil
         :hyperlinks t)
(uiop:quit)

Seems to work, but moving over to a shell to complete the task isn't ideal. If we had a way to do this all from Lisp an ASDF operation could generate the docs as part of the build.

What are your thoughts on automatic doc generation as part of the build?

didierverna / declt

DECLT version 4.0: DITA ? #8