actonlang / acton

The Acton Programming Language
https://www.acton-lang.org/
BSD 3-Clause "New" or "Revised" License
76 stars 7 forks source link

Add `actonc docs`: rendering of documentation from docstrings in source files #708

Open plajjan opened 2 years ago

plajjan commented 2 years ago

It would be cool if actonc could parse a source file or project and output documentation in a pretty format.

We obviously already parse all the code when reading an .act file and I assume that we don't throw away docstrings but keep them in our internal syntax tree representation. So I guess this is more or less about writing a filter for the syntax tree, filtering out most of the stuff just leaving a sort of skeleton.

I would like to use orgdown (https://gitlab.com/publicvoit/orgdown) for the docstrings, that is, inside of a docstring, we should think of the content as being in orgdown format. Orgdown is like markdown but based on org-mode (a very popular Emacs mode). "org-mode" means both a syntax as well as the actual Emacs mode with all its code etc. Unlike Markdown, which might have the most unintuitive syntax on the planet, orgdown actually makes a lot more sense... in my opinion. Anyway, Haskell has the pandoc library which I think we can use to consume org and output HTML.

I think we act on a per module basis, so parse a .act file, then filter the content down to a skeleton preserving the structure, something like

"""A module for doing hejsans
"""

def apa(name:str) -> int:
    """the docstring
    """
    return name.reverse()

def bepa(age:int) -> list[int]:
    """another docstring
    """
    return [age, age, age]

actor Foo(a: str):
    """docstring for an actor
    """

    def FooA(fjong: int) -> str:
        """docky stringy
        """
        return str(fjong)

is parsed into:

And from there we convert the whole thing to an orgdown document, like:

* Module =hejsan=
A module for doing hejsans

** function =apa= (=name= : =str=) -> =int=
the docstring

** function =bepa= (=age= : =int=) -> =list[int]=
another docstring

** actor =Foo= (=a= : =str=)
docstring for an actor
*** method =fooA= (=fjong= : =int=) -> =str=
docky stringy

then pass to pandoc -> HTML or some other output. I think it would be pretty cool to output man pages as well, since those can be nicely rendered to a terminal.

Alternatively, maybe it is easier to output our skeleton doc AST to pandoc AST, for which there is a JSON representation... I guess we'd then call pandoc on each docstring to produce pandoc-json, then glue these together to form a larger json of the whole doc, then feed back to pandoc to produce the final output.

sydow commented 2 years ago

Good idea! I could make an attempt at this, unless you want to do it yourself.

plajjan commented 2 years ago

@sydow feel free :) this is way over my head ;)

plajjan commented 2 years ago

I forgot to include type signatures in the example above. We obviously should, so I added that. Also added in an example .act file to better illustrate the mapping from actual acton source to AST to orgdown.

sydow commented 2 years ago

One possibility is to merge this with the dump command (which should be renamed to docs). The possible disadvantage is that the docstrings will be included in the .ty files, which will then be slightly longer and thus slower for the compiler to read. I don't think this is much of a problem; what do you think?

plajjan commented 2 years ago

The suggestion above is to keep the output document in the order of the source document, e.g. function apa is defined first, so it comes first in the documentation. An alternative to this would be to order everything according to what they are, for example, Golang documentation seems to have such ordering imposed:

Haskell doc, which is rendered by HAddock, on the other hand treats the source file much like prose and multiple comments spread in the source are all rendered. I quite like this idea, so we could do:

"""My module
This module is good for bli and bla.

* Convenience functions
The following functions expose a simpler but less flexible interface. It covers the most common use cases and it is recommended to use these when possible.
"""

def convenience():
    ...

def simple(asdf):
    ...

"""
* Raw interface
The following functions offer the full flexibility and power of this module:
"""

def something():
    ...

def otherthingy():
    ...

Notice the docstring in the middle which would be rendered just like anything else, at the location where it is, in between function definitions.

plajjan commented 2 years ago

@sydow not sure over whether to combine with dump or not. At one point I suggested that .ty files should contain docstrings because we would be able to read out documentation even if modules were distributed as binaries.... but, we pretty much abandoned that idea because the plan is for module/package management to deal with source code, so we are never in a situation where we only have a .ty file and not the source .act file. Thus, I dunno what the benefit would be of using the .ty file rather than the .act file.

On the other hand, I have no real strong arguments against it, so if you see a clear benefit to adding docstrings to .ty and using the dump command to render docs, then maybe that's the way forward.

sydow commented 2 years ago

I think that I jumped onto this too quickly. There are several issues to sort out:

nordlander commented 2 years ago

I won't argue with the first two points, they seem valid to me but I have no real insights into the problem.

Point 3, though, is actually a small misunderstanding. We really do preserve parameter type names during type inference, so that a function can be called with explicitly named arguments if so desired. It's just that the technique that would allow the same parameters to also be called using positional arguments isn't fully implemented yet, so type inference takes some (confusing) short-cuts with these for the time being. But since the issue discussed has a slightly longer timespan I thought I should point out what's actually in the pipe.

sydow commented 2 years ago

Great; one problem sorted out...

plajjan commented 2 years ago
  1. Isn't that the beauty of it? Sure it's enough rope to make a cluttered doc but also enough power to render very elegant things. I believe Haskell Haddock supports just such section headings.
  2. I don't understand the difference between header levels vs nested lists. How it is rendered with HTML is a matter of CSS. Semantics of the document are orthogonal to how we indent or otherwise show hierarchies with HTML? so section headers vs nested lists seems quite similar to me.
  3. Even if some info is missing, e.g. we only have name but not inferred type or only inferred type but not name, we can improve that later. While it is more important to get the other bits, with regards to orgdown etc correct, since then people can start writing docs in modules from day 1 and it'll look sort of nice, I still feel we can give ourselves some space to experiment. We can try this out and see how it feels.. We're likely going to make some changes as we go along but the fewer the better.
plajjan commented 2 years ago

Here's a direct link to how to deal with section headings in Haddock https://haskell-haddock.readthedocs.io/en/latest/markup.html#section-headings

it happens to look just like org-mode heading, i.e. using asterisk to denote header1, 2, 3 etc.

If we just have a standard way of rendering a function, then isn't this all fine? Or is it specifically rendering a function/method as a heading that you oppose? I guess it could maybe be something else?

sydow commented 2 years ago

Yes, there is Haddock markup for section headings of various levels, so the user can structure the documention using this markup; but these headings are also the only ones that occur in the resulting documentation. I understood your proposal as imposing a structure on the document where the module gets a level 1 heading, each top-level declaration a level 2 heading, each method in a toplevel class/actor a level 3 heading etc. I think that would interact strangely with possible user-provided headings.

What makes Haddock work so well is partly that the document structure is guided by the top level module's export list, which can order and group (using (sub)headings) documentation of declarations in a system regardless of the code structure. Each individual declaration produces its type signature and its "docstring", but not some document-level structure element.

I am a big fan of Haddock, but I thought we were aiming for something much simpler. Haddock has a deep understanding of Haskell and its module system, which makes it possible to produce very elegant documentation. Haddock also produces its own CSS files, LaTeX style files etc. But the Haddock github repo contains 24000+ lines of Haskell code...

I would be in favour of having, as you say, a standard way of rendering a function definition, which does not involve any high-level document structure element, and allowing "docstrings" also in between top-level definitions, for providing such structure, e.g. in the form of headings. You may have had this in mind in some comment above.

Preferrably we would also have accompanying CSS files, but I do not know how to approach this if we just produce orgdown and let pandoc or some other tool do the translation to HTML.

This would need an illuminating example, but that must wait until tomorrow.

plajjan commented 2 years ago

@sydow hmm, ah yes, my thinking was indeed that functions, actors, actor methods etc would produce a section heading but that the overarching structure could be controlled or modified by the user. Further, that we wouldn't have to deeply understand the content of docstrings but just "glue some stuff together". I hadn't thought that through; it might be that we would need to have some understanding of the docstring. However, I believe that is true regardless if we go with functions as section headers or as nested lists; the dilemma is the same, if someone has written a function docstring, it becomes relative to its surroundings. If it is a list;

"""There are a few groups of functions:
- the foobars
"""

def foo():
    """My foo function is good for:
    - eating pie
    - baking cookies
    """
def bar():
    """BARRRY
    """

"""
- the bafooza
  - which are good for doing the foozba
"""

def bafooza():
    """BA
    """

It is by no means a hard requirement that functions become section headings. It was just my initial suggestion since it felt natural but as I tried illustrating above, I don't think changing to nested lists fixes it.

But perhaps we should just revisit what we set out to do. I did include an example of a docstring that contains section headers (though conveniently left out how the complete output document would look like). Maybe we just should not encourage adding extra headers, or if you do, that it should only ever be on a top level and thus a level 2 heading (** heading) or some guideline like that?

I'm not sure how much guard rails we should aim to provide, there will always be ways for users to mess things up; I think rather we should just focus on making it simple to produce good documentation.

As for HTML and CSS, I mostly want to stay away from it. pandoc will render HTML from org and I want to influence that as little as possible (I don't even know if/how it can be configured). CSS is naturally something we can do, but I'm happy to postpone that to another year ;) our focus should be on producing a good source (orgdown) that is semantically sound.

sydow commented 2 years ago

My nested list thoughts were mainly for methods in classes and actors. Documenting them in a list subordinate to the class/actor just seems much clearer than as headers with one more asterisk.

I probably also disagree about keeping away from HTML and CSS as much as possible. I don't think it is by accident that Haddock produces HTML or LaTeX directly and not via a common lightweight markup language. We want to produce documents with very specific concepts, such as that of a function definition (with its parts name, parameter names, type and docstring). The markup should know of these concepts, allowing to choose a suitable rendering. Then it is not enough to use only the limited structuring tools of orgdown. In HTML we have at least div's with class attributes, in LaTeX we have many more possibilities.

Maybe we could talk a bit about this tomorrow, if there are not other more urgent topics.

plajjan commented 2 years ago

I think the biggest benefit to headers is that everything below the header automatically renders rather nicely. That is, if someone put a list in a function docstring, that will come out alright. If the list of functions is an org list, then another list in the function docstring will be messed up. Having org heading for functions allow more things in the function docstring.

orgdown has ways of passing through parameters to HTML export, so for example you can set CSS class on stuff, thus I think our standard export of functions etc can render org only and rely on pandoc to do the necessary.

I'd love to talk about it. I wanna talk about network IO interfaces tomorrow. Maybe we can try to keep that short (hah!). If nothing else, maybe you and I can find another slot to discuss this further!? :)

plajjan commented 2 years ago

Ok, so whipped up an example.

Writing org like this:

* Module =test=
This is my dear module test that works oh so well.

#+BEGIN_FUNCTION
- /function/ *apa* (
- a =str=,
- b =int=,
- c =list[int]=
- ) -> =str=
#+END_FUNCTION
This is a function that behaves as a monkey. It can do:
- jumps
- screams
- scratchy scratchy

#+BEGIN_FUNCTION
- /function/ *bepa* (
- a =str=,
- b =int=
- ) -> =None=
#+END_FUNCTION
This is another funcy funk.

Using pandoc -s -c acton_doc.css -f org -t html func.org renders to:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>func</title>
  <style>
    code{white-space: pre-wrap;}
    span.smallcaps{font-variant: small-caps;}
    span.underline{text-decoration: underline;}
    div.column{display: inline-block; vertical-align: top; width: 50%;}
    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
    ul.task-list{list-style: none;}
  </style>
  <link rel="stylesheet" href="acton_doc.css" />
</head>
<body>
<h1 id="module-test">Module <code>test</code></h1>
<p>This is my dear module test that works oh so well.</p>
<div class="FUNCTION">
<ul>
<li><em>function</em> <strong>apa</strong> (</li>
<li>a <code>str</code>,</li>
<li>b <code>int</code>,</li>
<li>c <code>list[int]</code></li>
<li>) -&gt; <code>str</code></li>
</ul>
</div>
<p>This is a function that behaves as a monkey. It can do:</p>
<ul>
<li>jumps</li>
<li>screams</li>
<li>scratchy scratchy</li>
</ul>
<div class="FUNCTION">
<ul>
<li><em>function</em> <strong>bepa</strong> (</li>
<li>a <code>str</code>,</li>
<li>b <code>int</code></li>
<li>) -&gt; <code>None</code></li>
</ul>
</div>
<p>This is another funcy funk.</p>
</body>
</html>

and with the following CSS:

/* Function div */
.FUNCTION ul {
        display: flex;
        flex-flow: row wrap;
        list-style: none;
        background-color: #999999;
}
.FUNCTION li {
        padding: 0.1em;
        padding-right: 0.2em;
}
/* Function name */
.FUNCTION li:first-child {
}
/* Function arguments*/
.FUNCTION li {
}
/* Function return type*/
.FUNCTION li:last-child {
}

it renders quite nicely:

2022-06-23-105137_1254x684_scrot

plajjan commented 2 years ago

Now, the above example using org works but I'm not sure if it's the best way. We have function docstrings, where the idea is that they are in orgdown format. We want to join those docstrings together with documentation that we render from the source code, e.g. function signatures.

I see two main approaches, first one I originally suggested is:

This way, we don't have to deal with HTML at all. We need to be aware of what it will look like etc to get CSS right, but don't have to write HTML. I thought this was the natural and best choice...

I noticed pandoc has support for rendering fragments, which means it is likely just as easy to:

The latter approach gives us better control of the HTML. The downside is that it is HTML specific. I had a dream about if we first assembled a complete org document we would be able to use pandoc to export it to HTML but also to groff / man pages and perhaps other formats too. If we have HTML specific stuff then I suppose we also need to write groff specific stuff if that's a format we want to support etc etc etc.

@sydow what do you think?

plajjan commented 2 years ago

Also added here for anyone looking to play further with it https://jsfiddle.net/6uhdq0t4/