improvement: Implement support for NumPy-style docstrings

celsiusnarhwal commented 1 year ago

This PR implements support for NumPy-style docstrings via the new NumpyProcessor class. It does so with the help of the numpydoc package, on which this PR makes Pydoc-Markdown dependent.

In addition to the above, this PR:

Adds a unit test for NumpyProcessor
Updates SmartProcessor to support NumpyProcessor
Updates pyproject.toml to reflect the addition of numpydoc as a dependency
Updates readme.md to reflect the addition of NumPy-style docstring support

This PR resolves #251.

Caveats and Limitations

NumpyProcessor.check_docstring_format() returns True if a docstring passes numpydoc's docstring validator without warnings or errors and False otherwise. Because SmartProcessor skips the call to check_docstring_format if the format is explicitly indicated in the docstring (e.g., with @doc:fmt:numpy), a docstring that would fail numpydoc's validator but nonetheless explicitly identifies itself as a NumPy-style docstring may result in warnings or exceptions at processing time.
- The processor converts docstrings to NumpyDocString objects before converting them to Markdown syntax. Instantiating a NumpyDocString object with an invalid docstring will result in warnings or exceptions.
Reference indexes in a docstring's Notes section are not hyperlinked to their corresponding references in the References section, in contrast to the numpydoc spec. This is due to what is apparently a behavior of Pydoc-Markdown's existing faculties, which insisted on rendering HTML tags in a way that broke the hyperlinks in all my attempts to implement this behavior. Examples of how reference indexes and references are rendered by NumpyProcessor can be found below.

Examples

Here are examples of how the various sections of a NumPy-Style docstring are rendered by NumpyProcessor.

Summary / Extended Summary

The Summary and Extended Summary are rendered together as a single summary. ### Input ``` Decode a string by shifting each character by a given offset. Extended Summary ---------------- There's not much else to say about this function, but if there was, it would go here. Fun fact: you don't need to include the Extended Summary heading — if your summary spans multiple lines, everything after the first will be implicitly considered to be the Extended Summary. You can't have both an implicit *and* explicit Extended Summary, though — that causes an exception! ``` ### Output Decode a string by shifting each character by a given offset. There's not much else to say about this function, but if there was, it would go here. Fun fact: you don't need to include the Extended Summary heading — if your summary spans multiple lines, everything after the first will be implicitly considered to be the Extended Summary. You can't have both an implicit *and* explicit Extended Summary, though — that causes an exception!

Parameters / Other Parameters / Attributes / Recieves

The Parameters, Other Parameters, Attributes, and Receives sections are all rendered similarly. ### Input ``` Parameters ---------- string : str The string to decode. Other Parameters ---------------- offset : int The offset by which to shift each character in the string. Defaults to 13. Attributes ---------- attr : Any Functions don't have attributes, but if we were documenting a class, we'd put its attributes here. Unfortunately, we are not. Too bad! Receives -------- param : Any If this was a generator, we'd document the parameters passed to it's `send()` method here. Unfortunately, it is not. Too bad! ``` ### Output **Arguments** * **string** (`str`): The string to decode. * **offset** (`int`): The offset by which to shift each character in the string. Defaults to 13. **Attributes** * **attr** (`Any`): Functions don't have attributes, but if we were documenting a class, we'd put its attributes here. Unfortunately, we are not. Too bad! **Receives** * **param** (`Any`): If this was a generator, we'd document the parameters passed to it's `send()` method here. Unfortunately, it is not. Too bad!

Returns / Yields

The Returns and Yields sections are rendered similarly. ### Input ``` Returns ------- str The decoded string. Yields ------ char : str The decoded string, one character at a time. By the way, you can optionally annotate your return and yield values with names like I did here. The type annotation isn't optional, though. ``` ### Output **Returns** * `str`: The decoded string. **Yields** * **char** (`str`): The decoded string, one character at a time. By the way, you can optionally annotate your return and yield values with names like I did here. The type annotation isn't optional, though.

Raises / Warns

The Raises and Warns sections are rendered similarly. ### Input ``` Raises ------ ValueError If the string contains non-alphabetic characters. Warns ----- UserWarning If I don't like you. ``` ### Output **Raises** * `ValueError`: If the string contains non-alphabetic characters. **Warns** * `UserWarning`: If I don't like you.

See Also

### Input ``` See Also -------- :func:`encode` Encode a string by shifting each character by a given offset. ``` ### Output **See Also** * :func:\`encode\`: Encode a string by shifting each character by a given offset. *(The processor leaves the task of cross-referencing functions, classes, and methods in this section to Pydoc-Markdown's existing faculties.)*

Notes

### Input ``` Notes ----- This function implements an inverse substitution cipher[1]_. ``` ### Output **Notes** This function implements an inverse substitution cipher¹.

References

### Input ``` References ---------- .. [1] https://en.wikipedia.org/wiki/Substitution_cipher ``` ### Output **References** 1. https://en.wikipedia.org/wiki/Substitution_cipher

Examples

The Examples section supports [doctests](https://docs.python.org/3/library/doctest.html). The processor renders doctests in code blocks and other content as plain text. The processor considers the start of a doctest to be marked by a line beginning with `>>>` and the end of a doctest to be marked by a blank line. If multiple doctests are present, they are rendered in separate code blocks. ### Input ``` Examples -------- >>> decode("Qba'g nfx fghcvq dhrfgvbaf!") "Don't ask stupid questions!" This is a super simple function so I don't really know why you'd need more than one example but here's another one anyway. >>> decode("Gunax lbh xvaqyl sbe lbhe nggragvba!") "Thank you kindly for your attention!" ``` ### Output **Examples** ```python >>> decode("Qba'g nfx fghcvq dhrfgvbaf!") "Don't ask stupid questions!" ``` This is a super simple function so I don't really know why you'd need more than one example but here's another one anyway. ```python >>> decode("Gunax lbh xvaqyl sbe lbhe nggragvba!") "Thank you kindly for your attention!" ```

NiklasRosenstein commented 1 year ago

Hey @celsiusnarhwal, thanks for this great PR! I'll be able to take a closer look at it next week.

NiklasRosenstein commented 1 year ago

Hey @celsiusnarhwal, sorry for the silence. I'm finally finding some time again to look at your PR

I've made some minor adjustments, and I'd almost be happy to merge it as it is now! Only that there are two unit tests failing because the NumpyProcessor identifies the examples below as seemingly being of the Numpy doc format when in reality they're not and they don't really get processed as a consequence.

E.g. for the test_pydocmd_processor test:

# Arguments
s (str): A string.
b (int): An int.

It spits the same back out. I've added some logging so we can tell which processor the SmartProcessor is delegating to:

INFO     pydoc_markdown.contrib.processors.smart:smart.py:92 Using `numpy` processor for Module `test` (detected)

NumpyProcessor.check_docstring_format() returns True if a docstring passes numpydoc's docstring validator without warnings or errors and False otherwise

I'm also thinking that this on the other may be too restrictive. If I want to use the Numpy docstring format, I may still make mistakes, and I'd actually want it to be identified as Numpy docstring format regardless of whether I have a minor mistake in my docstring formatting. Getting a warning (although maybe not an exception) in this case would be desirable.

What do you think about checking for the presence of Numpy-doc-like sections (e.g. Raises\n-------) in the content of the docstring instead?

NiklasRosenstein / pydoc-markdown

improvement: Implement support for NumPy-style docstrings #279

Caveats and Limitations

Examples