executablebooks / MyST-Parser

An extended commonmark compliant parser, with bridges to docutils/sphinx
https://myst-parser.readthedocs.io
MIT License
742 stars 196 forks source link

Allow for autodoc to parse Markdown docstrings #228

Open chrisjsewell opened 4 years ago

chrisjsewell commented 4 years ago

Originally posted by @asmeurer in https://github.com/executablebooks/MyST-Parser/issues/163#issuecomment-679917591

This issue will be of relevance here: https://github.com/sphinx-doc/sphinx/issues/8018

asmeurer commented 4 years ago

There's also the question of numpydoc, which defines its own syntax for some things like parameters. Should myst use the same syntax, but just using Markdown markup in the text? Or should it use something more markdownic?

chrisjsewell commented 4 years ago

I have just added definition list syntax rendering 😄 : see https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html#definition-lists

I think this could come in handy for an autodoc extension. Something like:

# Parameters

param1
: Description of param1

param2
: Description of param2

Thats maybe more markdownic?

asmeurer commented 4 years ago

I don't know. There's also the Google docstring style, which is a little different (and preferred by many people). It would probably be a good idea to get broader community feedback on these things.

chrisjsewell commented 4 years ago

It would probably be a good idea to get broader community feedback on these things.

Yep absolutely

But note, numpydoc and Google formats are both built around rST syntax. A markdown extension would use markdown-it-py to initially parse the docstring, and so any format has to be compatible with it in some fashion: utilising existing syntax plugins, or writing new ones.

Carreau commented 4 years ago

If it matters I'm working on decoupling parsing from rendering of docstring in IPython/Jupyter ; basically saying the if you can write a parser that goes from __doc__ to some well defined data structure with the right fields/info, then IPython (and by extension Jupyter) will know how to render it properly/nicely. (This could also pull some informations out of __signature__).

So, if the raw rendering to user In IPython/Jupyter is bothering you and influencing the syntax you are choosing, this will likely become less of an issue for users.

chrisjsewell commented 4 years ago

Thanks @Carreau, I'll bear that in mind 😄

While you're here; I just added https://myst-parser.readthedocs.io/en/latest/using/syntax-optional.html#auto-generated-header-anchors, so that you can write e.g. [](path/to/doc.md#heading-anchor) and it will work correctly both directly on GitHub and building via sphinx.

These anchor slugs, I've found, are a bit changeable in their implementation across renderers, but generally they are converging to the GitHub "specification".

Jupyter Notebook/Lab seems to be a bit outdated in this respect (or at least the versions I tested)? They don't lower-case or remove punctuation, etc.

I'm surprised by this, because I thought they were both generally built around markedjs at the moment (please move to markdown-it 😉), which does implement this behaviour: https://github.com/styfle/marked/blob/a41d8f9aa69a4095aedae93c6e6ee5522a588217/lib/marked.js#L1991

john-hen commented 3 years ago

I'm very much interested in this feature as I've been using Markdown doc-strings for a while and would like to move from recommonmark to MyST.

By the way, it took me quite a while to get to this GitHub issue here. It would have helped if the section in the docs regarding the autodoc extension clearly stated that Markdown is not supported in doc-strings.

choldgraf commented 3 years ago

That's a great point @John-Hennig - any interest in adding a PR to add a ```{warning} block there that also links to this issue in case folks want to give feedback?

dmwyatt commented 3 years ago

Originally posted by @asmeurer in #163 (comment)

This issue will be of relevance here: sphinx-doc/sphinx#8018

From the feedback autodoc issue it sounds like it might just be better to write a replacement for autodoc rather than trying to extend it?

oricou commented 3 years ago

Here is a trick to have Markdown docstring with commonmark. I guess it could be done with myst_parser.

https://stackoverflow.com/questions/56062402/force-sphinx-to-interpret-markdown-in-python-docstrings-instead-of-restructuredt

Sphinx's Autodoc extension emits an event named autodoc-process-docstring every time it processes a doc-string. You can hook into that mechanism to convert the syntax from Markdown to reStructuredText.

import commonmark

def docstring(app, what, name, obj, options, lines):
    md  = '\n'.join(lines)
    ast = commonmark.Parser().parse(md)
    rst = commonmark.ReStructuredTextRenderer().render(ast)
    lines.clear()
    lines += rst.splitlines()

def setup(app):
    app.connect('autodoc-process-docstring', docstring)
dmwyatt commented 3 years ago

It's funny that you posted that as I made this comment on that a few hours ago.

john-hen commented 3 years ago

Here is a trick to have Markdown docstring with commonmark. I guess it could be done with myst_parser.

Yes, it does work with MyST. Since my earlier comment here, I have replaced Recommonmark with MyST in my projects and, as before, I'm using Commonmark.py to render the Markdown doc-strings. I've also updated my Stackoverflow answer to reflect that and mention MyST now that Recommonmark has been deprecated.

This works great for me, actually. But all I need in doc-strings is syntax highlighting of code examples. So nothing fancy. People who want advanced features such as math rendering, cross references, or possibly NumPy style, will have to wait for native doc-string support in MyST.

oricou commented 3 years ago

@John-Hennig Great, could you share your code with MyST? TIA.

astrojuanlu commented 3 years ago

Today I found https://github.com/mkdocstrings/mkdocstrings, is it related to the scope of this issue?

choldgraf commented 3 years ago

@astrojuanlu mmmm probably not, because that seems to work with the mkdocs documentation engine, not Sphinx, no? Or is it usable for Sphinx as well?

astrojuanlu commented 3 years ago

Right, it's based on MkDocs - I brought it up because it could inform the format of the docstring, regardless of the implementation.

chrisjsewell commented 3 years ago

If anyone is motivated to tackle this, I would say an initial step would be to implement a https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#field-lists plugin within https://github.com/executablebooks/mdit-py-plugins.

Using this, we could implement the classic doctring structure:

def func(a):
    """Function description.

    :param a: Parameter description, but with *Markdown* syntax
    """
chrisjsewell commented 2 years ago

UPDATE:

With #455 implemented, it is now fully possible to use sphinx's python-domain directives in MyST 🎉 (see https://myst-parser.readthedocs.io/en/latest/syntax/optional.html#field-lists). For example, this will be properly parsed:

```{py:function} send_message(sender, priority)

Send a message to a recipient

:param str sender: The person sending the message
:return: the message id
:rtype: int

The sticking point now for autodoc (and similarly for https://github.com/readthedocs/sphinx-autoapi/issues/287) is that the `auto` directives first use `Documenter` sub-classes to generate source text (which is subsequently parsed), but the source text generation is currently hard-coded to RST
(see https://github.com/sphinx-doc/sphinx/blob/edd14783f3cc6222066fd63efbe28c2728617e18/sphinx/ext/autodoc/__init__.py#L299)

For example,

Is first converted to the text

.. py:class:: DocutilsRenderer(*args, **kwds) :module: myst_parser.docutils_renderer

A markdown-it-py renderer to ...


which MyST cannot parse.

Primarily you just need to overwrite some aspects of these documenters, to handle converting to MyST, something like.

```python
class MystFunctionDocumenter(FunctionDocumenter):
     def add_directive_header(self, sig: str) -> None:
         if parser_is_rst:
            super().add_directive_header(sig)
         if parser_is_myst:
             ...

then you load them via an extension:

def setup(app: Sphinx) -> Dict[str, Any]:
    app.add_autodocumenter(MystFunctionDocumenter)

this is certainly achievable.

One final thing (as noted https://github.com/sphinx-doc/sphinx/issues/8018#issuecomment-665727599), is that ideally you would be able to also switch the parser, based on if your docstrings were written in RST or Markdown, i.e. it would not matter whether you called autoclass from an RST or Markdown, it would always be parsed as Markdown.

john-hen commented 2 years ago

Converting the directive header may be fairly straightforward, but some of the domain directives will have a body with content that contains domain directives again. So these directives will be nested. It's quite a bit easier to do that in reST than it is in Markdown.

For example, let's say we have this module.py:

"""Doc-string of the module."""

class Class:
    """Doc-string of the class."""

    def method(self):
        """Doc-string of the method."""

We document is like so in index.rst:

.. automodule:: module
    :members:

And conf.py is simply:

extensions = ['sphinx.ext.autodoc']
import sys
sys.path.insert(0, '.')

When running sphinx-build . html -vv we see in the build log that Autodoc replaces the automodule directive with the following output:

.. py:module:: module

Doc-string of the module.

.. py:class:: Class()
   :module: module

   Doc-string of the class.

   .. py:method:: Class.method()
      :module: module

      Doc-string of the method.

It is already possible to render this with MyST:

```{py:module} module

Doc-string of the module.


Doc-string of the class.

```{py:method} Class.method()
:module: module

Doc-string of the method.

This produces the exact same HTML. But I had to put quadruple back-ticks at the outer scope to achieve the nesting. With reST, Autodoc just needs to increase the indentation level as it generates the body content of the directive line by line.

Maybe it's enough to just start with some extra back-ticks at the outer scope, for good measure. Nesting is usually not more than one level deep anyway. But the indentation also breaks the Markdown build. That's possibly an easy fix too, like override the content_indent attribute of the Documenter class. But Autodoc adds lines to the output in many different places, and often the indentation is just part of the string literal. That's where I gave up the last time I looked into this. I might give this another shot, but this could easily get quite complicated.

chrisjsewell commented 2 years ago

Thanks for the feedback @john-hen

Note, another approach would be to override AutodocDirective, and add a line here: https://github.com/sphinx-doc/sphinx/blob/edd14783f3cc6222066fd63efbe28c2728617e18/sphinx/ext/autodoc/directive.py#L172, which uses https://github.com/executablebooks/rst-to-myst to converts the RST in params.result to MyST

I guess this may be simpler, with the con that it pulls in more dependencies

john-hen commented 2 years ago

I now have a working demo that uses MyST to parse the doc-strings:

I wrote two custom Sphinx extensions:

I did not implement the reST/MyST switch that you suggested, Chris (@chrisjsewell), as I was already struggling with the method resolution order of the derived classes. Though it should be possible. I did not try using rst-to-myst. Not saying that doesn't work, but given that we pull in doc-strings written in Markdown, it felt wrong to feed it input that isn't strictly reStructuredText.

I avoided regex substitutions as much as possible. That's often not robust and tends to turn into a series of hacks. The only exception is a helper function that Autodoc calls restify, which I wrapped with a function called mystify. (And people say naming things is hard. 😄 )

However, the solution I settled on doesn't exactly strike me as "clean" either. It replicates a lot of code from Autodoc. Some of the duplication could be avoided with monkey-patching, I guess, or messing with the method resolution order. And I think MyST would also tolerate much of the indentation needed in reST, so maybe some modifications aren't actually necessary. Point being, there could easily be a better way than this, that I didn't think of. Keep in mind that I've never written a Sphinx extension before.

I tested with that demo project as well as a larger, but still medium-sized project I maintain. That's still a small sample size. Autodoc has many features that were not used. Ultimately, I suppose, one would have to run essentially the same tests as for Autodoc and Autosummary, only with Markdown input. But I know next to nothing about Sphinx's test suite, so I left it at that. Also, Markdown containing nested code fences (using more than three back-ticks) should break the current solution. There's an easy fix for that, but there will always be a finite limit.

For comparison, I uploaded the same demo project written with reST as well as with MkDocs (checking out the competition, so to speak). The rendered docs are linked from the front page of the MyST demo build.

jedbrown commented 2 years ago

This is great. I'm happy to see that MyST-style math, including equation references across modules, works here (use r"" strings otherwise \nabla becomes <newline>abla). I get spurious duplicate labels, but I think that's just because package.action overlaps package.actions and nothing to do with your work. We'll probably start using this in at least one project if you release it.

``` /home/jed/src/demo-MyST-docstring/docs/api/package.action.md:7: WARNING: duplicate label of equation bar, other instance in api/package.actions /home/jed/src/demo-MyST-docstring/docs/api/package.actions.md:7: WARNING: duplicate label of equation bar, other instance in api/package.action ```
jborean93 commented 2 years ago

I seem to be coming across an issue trying to render links in a docstring when using autodoc and having this in my markdown file

````{eval-rst}
.. automodule:: mymodule
   :members:
   :undoc-members:
   :show-inheritance:

A docstring (Google style) I'm using is something like

```python
def my_function():
    """Header

    Some text with a `link`_.

    .. _link:
        https://github.com
    """
    pass

I get a warning when generating the docs

/path/to/mymodule/init.py:docstring of mymodule.my_function:12: ERROR: Unknown target name: "link".

The converted HTML has the following element set for this is

<a href="#id309"><span class="problematic" id="id310">`link`_</span></a>.</p>

The same function works just fine if I have an rst with the autodoc entry and is rendered outside of MyST. Unfortunately as mentioned in https://github.com/executablebooks/MyST-Parser/issues/519 the latest release seems to have changed how links are referenced so my other MyST generated docs that had the following now no longer work.

[Link To My Function](./source/mymodule.html#mymodule.my_function)

By using a markdown file to embed the autodoc entries these links now work, albiet in a slightly different way, but the actual docstring references in the function/classes are broken. Happy to try out anything as right now I'm stuck on the older version where both scenarios still work.

chrisjsewell commented 2 years ago

Heya @jborean93, https://github.com/executablebooks/MyST-Parser/issues/228#issuecomment-1041097220 is unrelated to the parsing of docstrings as Markdown, and should be opened as a separate issue. I seem to recall there being a similar issue already open, but couldn't find it on a quick search

jborean93 commented 2 years ago

My apologies, I was going through the various issues and melded this with https://github.com/executablebooks/MyST-Parser/issues/163 but can see that is wrong. I'll do another scan through of the issues and open a new one if I can't find anything related.

tony commented 2 years ago

With #455 implemented, it is now fully possible to use sphinx's python-domain directives in MyST 🎉 (see https://myst-parser.readthedocs.io/en/latest/syntax/optional.html#field-lists). For example, this will be properly parsed: ...

@chrisjsewell Thank you!

Both to Chris and anyone who is a stakeholder in this: There's multiple threads on this topic spanning across projects. I'm confused as to where things sit - and if there's a viable workaround or configuration we can paste in the mean time.

sphinx-autoapi + myst-parser

In re: https://github.com/executablebooks/MyST-Parser/issues/228#issuecomment-986447789 and https://github.com/readthedocs/sphinx-autoapi/issues/287#issuecomment-986448384

What's the status of sphinx-autoapi and myst-parser? Are there still steps needed in myst-parser, sphinx, etc? What's needed to get to the point where there's a functioning demo online w/ source code we can clone?

demberto commented 2 years ago

@oricou Is this possible with MyST's Python API? I want to use GFM style tables in my docstrings, since that's the only format VSCode supports apparently.

chrisjsewell commented 1 year ago

Ok so who wants to give it a go 😄 : https://sphinx-autodoc2.readthedocs.io/en/latest/quickstart.html#using-markdown-myst-docstrings

chrisjsewell commented 1 year ago

Now integrated into the documentation 😄 https://myst-parser.readthedocs.io/en/latest/syntax/code_and_apis.html#documenting-whole-apis

astrojuanlu commented 1 year ago

Are there examples of MyST docstrings in the wild? The ones I see in the docs borrow the :param style from reST, but I'd love to see more distinct MyST features being showcased.

Fantastic job everyone!

chrisjsewell commented 1 year ago

The ones I see in the docs borrow the :param style from reST, but I'd love to see more distinct MyST features being showcased.

Well that is a syntax available in MyST: https://myst-parser.readthedocs.io/en/latest/syntax/optional.html#field-lists

Personally, I find that the best, most concise way, to document parameters, so I don't have any problem "borrowing" it. Plus it potentially makes it easier for people to transition.

Did you have anything else in mind?

hmgaudecker commented 1 year ago

This looks great, many thanks!

In order to decide which project to try it out on: I could not find anything on whether numpy- and/or google-style docstrings should work with MyST? Apologies if I missed that.

chrisjsewell commented 1 year ago

I could not find anything on whether numpy- and/or google-style docstrings should work with MyST?

You tell me 😅 I haven't tried, as far as the sphinx style works, it's just acting on "pre-parsed" AST

numpy looks like it is acting on definition lists, which are slightly different in myst, so probably not automatically, the parsing of headings also may require a "fix" in autodoc2 Maybe similar with google docstrings

chrisjsewell commented 1 year ago

Yeh no actually, looking at the napoleon code, it does horrible parsing of the whole docstring and turning it into rst, so that is a no go, in terms of it "just working" out the box

But I'm sure we can come up with something better 😄

hmgaudecker commented 1 year ago

Wow, thanks for the quick replies and research!

As far as I am concerned, it would be great to have support for Google-style, just so much more readable than the :param-style. But I'll definitely try autodoc2 in some smaller projects not requiring this.

The numpy-style probably would require some discussion on whether one wants to stick to the rst-style underlining of headers or markdown headers. I would not have a strong opinion (and may convert everything I have to Google-style, anyhow).

chrisjsewell commented 1 year ago

So just to explain a little

here: https://github.com/sphinx-extensions2/sphinx-autodoc2/blob/13933a5b25a780e03f227414d432420706962212/src/autodoc2/sphinx/docstring.py#L125 you have your "python object", with the docstring, and then you can parse that (to docutils nodes/AST) however you want

in autodoc+napoleon, the key point is here: https://github.com/sphinx-doc/sphinx/blob/30f347226fa4ccf49b3f7ef860fd962af5bdf4f0/sphinx/ext/napoleon/__init__.py#L320 napoleon takes the docstring from autodoc, and then mutates to a different string, before giving it back to autodoc to create the final text, which it eventually parses to AST similarly: https://github.com/sphinx-doc/sphinx/blob/30f347226fa4ccf49b3f7ef860fd962af5bdf4f0/sphinx/ext/autodoc/directive.py#L147

The problem being that both napoleon and autodoc only generate RST

astrojuanlu commented 1 year ago

Oh, didn't realize :param is now MyST, thanks! I was also interested in NumPy-style docstrings

chrisjsewell commented 1 year ago

When we talk about numpy/google style, I would start by asking; would you agree that, given we now have type annotations and type checking, it is no longer good practice to put types in the docstring? That would simplify things a little

hmgaudecker commented 1 year ago

I would.

chrisjsewell commented 1 year ago

Then, if you don't want sphinx style, I would suggest a bit of a hybrid, that would work for both rst and myst: basically a heading followed by a field list, e.g.

for MyST

# Parameters
:x: a description
:y: a description

or for RST

Parameters
-----------
:x: a description
:y: a description

this would be very easy to parse, just with the standard rst/myst parser, then you just run a "transform" on the AST, that finds these headings and "propogates" them down to the field list, i.e. to get back to the sphinx style

:param x: a description
:param y: a description
rmorshea commented 1 year ago

That field list syntax would be perfectly acceptable for me personally coming from Google style docstrings. With that said, it would certainly be helpful for projects trying to transition to MyST if Google/Numpy styles were supported as it would require less work and receive less push back from those who might already find RST->MyST to be an uncomfortable change.

hmgaudecker commented 1 year ago

That field list syntax would be perfectly acceptable for me personally coming from Google style docstrings. With that said, it would certainly be helpful for projects trying to transition to MyST if Google/Numpy styles were supported as it would require less work and receive less push back from those who might already find RST->MyST to be an uncomfortable change.

Agreed. Though a converter script might do the job and ease the maintenance burden.

chrisjsewell commented 1 year ago

Note something like this may be of use: https://pypi.org/project/docstring-parser/

Google/numpy are a bit weird, in that they are "pseudo rst", with effectively a bespoke "structure" with nested rst. But I guess one could parse the structure first, even with myst, then parse properly

naquiroz commented 1 year ago

Is there a possible workaround (maybe using/adding other dependencies) for auto parsing docstrings that use both myst and napoleon?

chrisjsewell commented 1 year ago

Is there a possible workaround (maybe using/adding other dependencies) for auto parsing docstrings that use both myst and napoleon?

Oh indeed, thats what I mean by the above, you just need a "hook" in: https://github.com/sphinx-extensions2/sphinx-autodoc2/blob/13933a5b25a780e03f227414d432420706962212/src/autodoc2/sphinx/docstring.py#L125 to allow for "re-interpretation" of the docstring

naquiroz commented 1 year ago

That solution however requires using sphinx-autodoc2 which is not as popular as I would like, so I would rather wait. I need a more battle-tested approach.

chrisjsewell commented 1 year ago

That solution however requires using sphinx-autodoc2 which is not as popular as I would like, so I would rather wait. I need a more battle-tested approach.

Ah well thats a chicken and egg 😅 I only created it a few weeks ago, and need your guys help to test/improve it, all issues/PRs welcome 🙏

mj023 commented 1 year ago

Great Project! But we also ran into the issue of not wanting to use the Sphinx-Style when I started transitioning one of our projects to autodoc2. I was able to convert some of our Numpy-Docstrings using Pyment, but we ultimately put it on hold. I think the proposed Heading + Fieldlist style would already be enough for us.

pawamoy commented 4 months ago

@chrisjsewell

Note something like this may be of use: https://pypi.org/project/docstring-parser/ Google/numpy are a bit weird, in that they are "pseudo rst", with effectively a bespoke "structure" with nested rst. But I guess one could parse the structure first, even with myst, then parse properly

That's the approach I took with Griffe: it parses the different styles into the same data structures/classes. Basically, it parses a docstring into a list of sections, each section having its own specific kind and contents (regular text, arguments, returns, exceptions, etc.).

It's (almost) markup agnostic: regular text sections as well as as any item description (parameter, returned value, etc.) can be written in Markdown, rST, Asciidoc, whatever the end user prefers. I wrote almost because Griffe's parsers still check for fenced code blocks (using triple-backticks) to prevent parsing of sections inside Markdown code blocks. This is not an issue for rST since they would be indented and therefore not matched.

Anyway, just a shameless plug :smile: See usage examples here: https://mkdocstrings.github.io/griffe/parsing_docstrings/