Closed nonprofittechy closed 3 years ago
From #127:
alternative docstring formats. To my knowledge the popular options are:
- Sphinx (looks like it comes with a vs code plugin): https://sphinx-rtd-tutorial.readthedocs.io/en/latest/docstrings.html
- Google https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html#example-numpy See the docs here: https://pydoc-markdown.readthedocs.io/en/latest/docs/api-documentation/processors/
Needs from a documentation renderer:
Next steps:
Will take notes on what I'm looking at here and edit as I find out more.
I'm not sure if we're just looking at docstring formats or at document building tools too. One goal is strictness. More bells and whistles in a docstring syntax might make it easier for doc renderer to hook into different parts, but strictness is really enforced by the renderer and I'm not experienced enough to say what level of syntax is going to allow what kind of doc rendering.
The only way I can really think of looking at this is to decide what doc renderer we like and then use the syntax they use. I looked at a few different syntaxes anyway. It's unclear to me which syntaxes absolutely require types and which just allow you to include them. Probably has more to do with the renderer. Below I often used a syntax's need for types as a kind of measure of how detailed its syntax got.
I've found these syntaxes for docstrings:
I find it a bit busy to read. It can apparently integrate with sphinx, but I'm not sure in what way. I'm also not yet sure if it only generates whole sites or if it's willing to generate individual files. It doesn't show a whole file contents in detail in one long page, but instead seems to break things up into modules, classes, and names. Not super pretty, but we can probably add custom CSS somewhere.
Instructions on how to document your code with it.
As you can see from the demo linked above, it seems to provide pretty strict structuring with the syntaxes/formats it uses. You can see it says can use limited versions fo either epytext or reStructuredText.
Demo 1 (sphinx makes its documentation using sphinx) Demo 2, readthedocs.io
Not sure how they're being rendered so differently in those two places. I think sphinx's own page is likely to be closer to the bone than the readthedocs one, so for now I suggest looking at that for a bare-bones view of sphinx.
Demo: none looked for
~Not sure exactly what this is for, but~ For converting documents from one format to another. I don't think it's what we need. I don't think it parses code. The only documentation-formatted input I saw was Haddock. I used their 'try it online' tool and it failed to convert a Haddock code block to a Markdown code block.
Unsurprisingly, the very popular reStructuredText (what sphinx uses) looks like a good bet for docstrings. If that's the only decision we're making in this issue, that's my recommendation. Do we want a different issue for doing research on documentation renderers?
☝️ @nonprofittechy , @BryceStevenWilley , @purplesky2016
My only real hesitancy with Sphinx is that it's mostly rst, which IMO is just yet another markup language that I will get switched up with normal markdown. But it looks like Sphinx does support standalone markdown, so that should be fine, if we want larger, in code guides or tutorials.
I don't think we want to change renderers? That'd make us change from docusaurs, which we could I guess, it'd just be more work at this point, unless we found something easy enough to use.
I think I'm misunderstanding something here.
I think the different aspects of the format are getting mixed up here.
Markdown is used for general descriptive text. But formats that use Markdown for the descriptive text I thought still have varying ways to do things like:
At least, I know that restructured text is a general purpose markup format, like markdown, and it looks like from your investigation it has this specialized subset syntax for when it's used to document code. I thought there was an equivalent in Markdown.
I agree with Bryce that I prefer we stick with a format that extends Markdown instead of replaces it, and also one that works with Docusaurus. But there could be drawbacks to that that we don't know about.
Edit: when we were talking about something that is "stricter" what I meant, at least, was not just a blob of text with no rules about the order or arrangement of the content, but a way to give it a semantic structure so that the variables, return types, etc. were always presented with a consistent format. How to mark text as bold, etc. is less important--we want rules for that in our Docusaurus page based on what each piece of text means, not something the author decides when they write the docstring.
To be even more specific: I'm hoping the only thing we need to change is the pydoc-markdown processor. See https://pydoc-markdown.readthedocs.io/en/latest/docs/api-documentation/processors/
Agreed with Quinten about the strictness, I was thinking something more like how the document (or in this case, the docstrings) should be organized, not text formatting.
And I think I am still confused on the difference between a renderer, a site-builder, what "Documentation tools" are in general, and which library / tool does which. There's definitely a lot of overlap, but I do think Sphinx works with pydoc-markdown
, which we already can use with docusaurs.
Thanks for the clarification. Really narrows down the research I'm doing much more clearly. I'll check those processors.
As for meshing with docusaurus, it's about the processor/generator, not really about the format we ourselves write in if I'm understanding correctly. Docusaurus is not going to understand some special markdown, so the initial input format/syntax doesn't matter as long as the thing that's generated meshes with docusaurus's markdown/html and its file structure. Let me know if I've misunderstood your intention there.
As for meshing with docusaurus, it's about the processor/generator, not really about the format we ourselves write in if I'm understanding correctly. Docusaurus is not going to understand some special markdown, so the initial input format/syntax doesn't matter as long as the thing that's generated meshes with docusaurus's markdown/html and its file structure. Let me know if I've misunderstood your intention there.
The processor will handle the special rules about how to note the beginning/end of variable list, cross references, etc. and use that to create a plain Markdown file that Docusaurus can then turn into HTML. Is that an accurate summary of what you're asking? I'm not sure I fully understand your second paragraph.
I think that's what I was describing, yes. The line
I prefer we stick with a format that... works with Docusaurus
confused me a bit about whether I needed to look for something in addition to what had been listed - maybe some extension for Docusaurus so it could understand the new format. I was confirming that we don't need Docusaurus to care about the new format, we just need something that will convert the new format into something that Docusaurus does understand.
Things are a bit conflated in my mind, so I'll try to write stuff out. Wish me luck.
Markdown is very simple and has some extensions. I have not found any that are specifically for code documentation. Some formats, like google, use markdown and can then include reStructuredText and such. They also have their own conventions/syntaxes that allow processors/renderers/whatever to generate useful markdown or html that other frameworks can use to build documentation. I'm not sure whether you'd consider those an extension of markdown or not.
Similarity to Markdown - of the three offered, based on the examples given, pydocmd seems the most light-handed, followed by google, and then sphinx. It's hard to tell because the examples are so limited.
Another aspect, one that seems more relevant to the output, is what a framework's processors do with the syntax of the docstring. pydoc-markdown, from that documentation page, seems to only take advantage of a subset of the syntax that any of those formats provide, though it doesn't describe which features it does use. The examples it offers for sphinx vs. google is apples to oranges. That aside, pm seems to use formatters mostly for appearance and not much with other functionality. For example, it has separate processors for automated cross-referencing.
I have not yet found a way that pydoc-markdown can use things in the code itself, like type annotation, to help generate docs.
I realize your list was a set of examples, but pydoc-markdown seems limited in what it does. For now, I'll report back on what I've been able to tell so far about that specific list.
Its default processor tries to guess which of the three syntaxes your docstring is using. I haven't yet found anything else that it is capable of doing.
I'm not sure how to easily find examples of projects using pydoc-markdown, let alone with each of the three separate processors. The best I can offer is taking aside some serious time to try out some stuff myself with our codebase if we want to do a deeper exploration. I'm not sure I'm really finding what you're looking for. I may be the wrong person for this job.
If we're looking for something to enforce an order for our docstring like 'First a descriptive sentence, then list args, then errors, then examples, the return values', I haven't seen anything that enforces that so we can just use what everyone else is using and stick to it as best as we can.
From just looking more closely at all of the examples in responding to this thread, I think we should just use the Google style docstrings: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html.
I'm little confused by your list. All three of the processors have a way to list variables, exceptions, return type, etc.
But it looks like Google is going to be most like what we're used to and has the most features. Unless there are any objections or people think we should discuss more--I'll go ahead and try it on one class definition and see how it renders with pydoc-markdown.
All three of the processors have a way to list variables, exceptions, return type, etc.
They have ways to list arguments and attributes, I think, not just any variable. [I also didn't see a type listed for a return value. Maybe I missed it. I don't think exceptions were in the list I was addressing, but I can confirm that the examples for the last two do show exceptions.]
They don't have a complete reference on the processor page. You need to Google, e.g., "google docstring", "sphinx docstring", or "pydoc-md docstring" format to find the full description of the format.
Yes, I linked the complete references above. I'm saying I'm not sure pydoc-markdown implements all those features.
I guess you are hampered if you aren't able to run pydoc-markdown locally yet. It was pretty quick to download the sample file and confirm all of the features seem to be there. https://github.com/SuffolkLITLab/docassemble-AssemblyLine-documentation/tree/docstring-test
Resolved: we will use the Google docstring format described here: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html