Choose a docstring format for our pydoc documentation

nonprofittechy commented 3 years ago

Resolved: we will use the Google docstring format described here: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

plocket commented 3 years ago

From #127:

alternative docstring formats. To my knowledge the popular options are:

Sphinx (looks like it comes with a vs code plugin): https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html

Google https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html#example-numpy See the docs here: https://pydoc-markdown.readthedocs.io/en/latest/docs/api-documentation/processors/

Needs from a documentation renderer:

Avoids us typing redundant info. E.g. take advantage of automatic type annotation identification
Provide strict rules to make sure the team uses consistent format

Next steps:

Look at some popular python libs and see what their documentation pages look like. readthedocs.io uses sphinx, for example.
Focus on the ALDocument class right now as that is one that devs will probably be accessing explicitly while the others are used internally.

plocket commented 3 years ago

Will take notes on what I'm looking at here and edit as I find out more.

I'm not sure if we're just looking at docstring formats or at document building tools too. One goal is strictness. More bells and whistles in a docstring syntax might make it easier for doc renderer to hook into different parts, but strictness is really enforced by the renderer and I'm not experienced enough to say what level of syntax is going to allow what kind of doc rendering.

The only way I can really think of looking at this is to decide what doc renderer we like and then use the syntax they use. I looked at a few different syntaxes anyway. It's unclear to me which syntaxes absolutely require types and which just allow you to include them. Probably has more to do with the renderer. Below I often used a syntax's need for types as a kind of measure of how detailed its syntax got.

Docstrings

I've found these syntaxes for docstrings:

Sphinx/reStructuredText. Are they the same? It's all very confusing. These docs do describe a bunch of stuff which sounds pretty fun and syntax that seems reasonable.
Numpy. The documentation doesn't seem as easy to follow, but I may be tired. It seems to do less and the syntax seems just as complex for the stuff that it does do.
Google. Seems much less structured. It might be enough for us, though it doesn't look like there'd be room for growth later.
epytext. Has similar capabilities to sphinx, I think. I'm not sure I like the syntax as much.

Document rendering

Assumptions

renderers Markdown or HTML
renderers individual files we can insert into docusaurus

Lists of documentation tools:

https://wiki.python.org/moin/DocumentationTools

Documentation tools

pydoctor (maybe)

√ handles type annotation
√ handles decorators
√ links to source code
√ strictly structured syntax
X simple appearance

Demo

I find it a bit busy to read. It can apparently integrate with sphinx, but I'm not sure in what way. I'm also not yet sure if it only generates whole sites or if it's willing to generate individual files. It doesn't show a whole file contents in detail in one long page, but instead seems to break things up into modules, classes, and names. Not super pretty, but we can probably add custom CSS somewhere.

Instructions on how to document your code with it.

As you can see from the demo linked above, it seems to provide pretty strict structuring with the syntaxes/formats it uses. You can see it says can use limited versions fo either epytext or reStructuredText.

sphinx (maybe)

Demo 1 (sphinx makes its documentation using sphinx) Demo 2, readthedocs.io

Not sure how they're being rendered so differently in those two places. I think sphinx's own page is likely to be closer to the bone than the readthedocs one, so for now I suggest looking at that for a bare-bones view of sphinx.

pandoc (no)

Demo: none looked for

~Not sure exactly what this is for, but~ For converting documents from one format to another. I don't think it's what we need. I don't think it parses code. The only documentation-formatted input I saw was Haddock. I used their 'try it online' tool and it failed to convert a Haddock code block to a Markdown code block.

plocket commented 3 years ago

Unsurprisingly, the very popular reStructuredText (what sphinx uses) looks like a good bet for docstrings. If that's the only decision we're making in this issue, that's my recommendation. Do we want a different issue for doing research on documentation renderers?

plocket commented 3 years ago

☝️ @nonprofittechy , @BryceStevenWilley , @purplesky2016

BryceStevenWilley commented 3 years ago

My only real hesitancy with Sphinx is that it's mostly rst, which IMO is just yet another markup language that I will get switched up with normal markdown. But it looks like Sphinx does support standalone markdown, so that should be fine, if we want larger, in code guides or tutorials.

I don't think we want to change renderers? That'd make us change from docusaurs, which we could I guess, it'd just be more work at this point, unless we found something easy enough to use.

plocket commented 3 years ago

I think I'm misunderstanding something here.

I thought we wanted something with more strict syntax than markdown. If not, what is it we're looking for here?
I thought we have a renderer and were looking to see if we want to switch to something else. We'd discussed our current solution as good-enough-for-now and I thought part of this assignment was to see what else there is.
Maybe I really don't understand the ecosystem here. I thought sphinx was a renderer and site builder and what we wanted is a renderer, not a site builder.

nonprofittechy commented 3 years ago

I think the different aspects of the format are getting mixed up here.

Markdown is used for general descriptive text. But formats that use Markdown for the descriptive text I thought still have varying ways to do things like:

Mark the definition of a variable.
Specify the return type and value
Link to other parts of the documentation
Probably other specialized tasks that are part of documenting code, not the base Markdown specification.

At least, I know that restructured text is a general purpose markup format, like markdown, and it looks like from your investigation it has this specialized subset syntax for when it's used to document code. I thought there was an equivalent in Markdown.

I agree with Bryce that I prefer we stick with a format that extends Markdown instead of replaces it, and also one that works with Docusaurus. But there could be drawbacks to that that we don't know about.

Edit: when we were talking about something that is "stricter" what I meant, at least, was not just a blob of text with no rules about the order or arrangement of the content, but a way to give it a semantic structure so that the variables, return types, etc. were always presented with a consistent format. How to mark text as bold, etc. is less important--we want rules for that in our Docusaurus page based on what each piece of text means, not something the author decides when they write the docstring.

nonprofittechy commented 3 years ago

To be even more specific: I'm hoping the only thing we need to change is the pydoc-markdown processor. See https://pydoc-markdown.readthedocs.io/en/latest/docs/api-documentation/processors/

BryceStevenWilley commented 3 years ago

Agreed with Quinten about the strictness, I was thinking something more like how the document (or in this case, the docstrings) should be organized, not text formatting.

And I think I am still confused on the difference between a renderer, a site-builder, what "Documentation tools" are in general, and which library / tool does which. There's definitely a lot of overlap, but I do think Sphinx works with pydoc-markdown, which we already can use with docusaurs.

plocket commented 3 years ago

Thanks for the clarification. Really narrows down the research I'm doing much more clearly. I'll check those processors.

As for meshing with docusaurus, it's about the processor/generator, not really about the format we ourselves write in if I'm understanding correctly. Docusaurus is not going to understand some special markdown, so the initial input format/syntax doesn't matter as long as the thing that's generated meshes with docusaurus's markdown/html and its file structure. Let me know if I've misunderstood your intention there.

nonprofittechy commented 3 years ago

As for meshing with docusaurus, it's about the processor/generator, not really about the format we ourselves write in if I'm understanding correctly. Docusaurus is not going to understand some special markdown, so the initial input format/syntax doesn't matter as long as the thing that's generated meshes with docusaurus's markdown/html and its file structure. Let me know if I've misunderstood your intention there.

The processor will handle the special rules about how to note the beginning/end of variable list, cross references, etc. and use that to create a plain Markdown file that Docusaurus can then turn into HTML. Is that an accurate summary of what you're asking? I'm not sure I fully understand your second paragraph.

plocket commented 3 years ago

I think that's what I was describing, yes. The line

I prefer we stick with a format that... works with Docusaurus

confused me a bit about whether I needed to look for something in addition to what had been listed - maybe some extension for Docusaurus so it could understand the new format. I was confirming that we don't need Docusaurus to care about the new format, we just need something that will convert the new format into something that Docusaurus does understand.

plocket commented 3 years ago

Things are a bit conflated in my mind, so I'll try to write stuff out. Wish me luck.

Markdown is very simple and has some extensions. I have not found any that are specifically for code documentation. Some formats, like google, use markdown and can then include reStructuredText and such. They also have their own conventions/syntaxes that allow processors/renderers/whatever to generate useful markdown or html that other frameworks can use to build documentation. I'm not sure whether you'd consider those an extension of markdown or not.

Similarity to Markdown - of the three offered, based on the examples given, pydocmd seems the most light-handed, followed by google, and then sphinx. It's hard to tell because the examples are so limited.

Another aspect, one that seems more relevant to the output, is what a framework's processors do with the syntax of the docstring. pydoc-markdown, from that documentation page, seems to only take advantage of a subset of the syntax that any of those formats provide, though it doesn't describe which features it does use. The examples it offers for sphinx vs. google is apples to oranges. That aside, pm seems to use formatters mostly for appearance and not much with other functionality. For example, it has separate processors for automated cross-referencing.

I have not yet found a way that pydoc-markdown can use things in the code itself, like type annotation, to help generate docs.

I realize your list was a set of examples, but pydoc-markdown seems limited in what it does. For now, I'll report back on what I've been able to tell so far about that specific list.

Mark the definition of a variable: Seems unlikely.
Specify the return type and value: Seems likely.
Link to other parts of the documentation: Automated with a specific processor.

Its default processor tries to guess which of the three syntaxes your docstring is using. I haven't yet found anything else that it is capable of doing.

I'm not sure how to easily find examples of projects using pydoc-markdown, let alone with each of the three separate processors. The best I can offer is taking aside some serious time to try out some stuff myself with our codebase if we want to do a deeper exploration. I'm not sure I'm really finding what you're looking for. I may be the wrong person for this job.

plocket commented 3 years ago

If we're looking for something to enforce an order for our docstring like 'First a descriptive sentence, then list args, then errors, then examples, the return values', I haven't seen anything that enforces that so we can just use what everyone else is using and stick to it as best as we can.

nonprofittechy commented 3 years ago

From just looking more closely at all of the examples in responding to this thread, I think we should just use the Google style docstrings: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html.

I'm little confused by your list. All three of the processors have a way to list variables, exceptions, return type, etc.

But it looks like Google is going to be most like what we're used to and has the most features. Unless there are any objections or people think we should discuss more--I'll go ahead and try it on one class definition and see how it renders with pydoc-markdown.

plocket commented 3 years ago

All three of the processors have a way to list variables, exceptions, return type, etc.

They have ways to list arguments and attributes, I think, not just any variable. [I also didn't see a type listed for a return value. Maybe I missed it. I don't think exceptions were in the list I was addressing, but I can confirm that the examples for the last two do show exceptions.]

nonprofittechy commented 3 years ago

They don't have a complete reference on the processor page. You need to Google, e.g., "google docstring", "sphinx docstring", or "pydoc-md docstring" format to find the full description of the format.

plocket commented 3 years ago

Yes, I linked the complete references above. I'm saying I'm not sure pydoc-markdown implements all those features.

nonprofittechy commented 3 years ago

I guess you are hampered if you aren't able to run pydoc-markdown locally yet. It was pretty quick to download the sample file and confirm all of the features seem to be there. https://github.com/SuffolkLITLab/docassemble-AssemblyLine-documentation/tree/docstring-test

SuffolkLITLab / docassemble-AssemblyLine-documentation