Using ruff to format code examples in docstrings

stinodego commented 1 year ago

At @pola-rs , we currently use ruff in combination with black for our Python formatting needs. We also use blackdoc for formatting our docstring code examples. Out of these three, blackdoc is by far the slowest.

With ruff gaining many of the capabilities of the black formatter, it would be great if we could also replace blackdoc soon and only use ruff.

MichaReiser commented 1 year ago

This sounds really useful and something our formatter could do. However, I don't expect us to make progress on this in the near future but happy to talk external contributors through on how this could be implemented.

One question that comes to mind is what do with code that has syntax errors but the easiest is to just leave such code unchanged.

stinodego commented 1 year ago

This sounds really useful and something our formatter could do. However, I don't expect us to make progress on this in the near future but happy to talk external contributors through on how this could be implemented.

Without any knowledge of the code base:

I think Examples blocks can already be detected as there are some docstring formatting rules in pydocstyle related to sections. Then within the block, code examples should be easy to detect as lines starting with >>>, followed by one or more lines starting with .... Example:

>>> pl.date_range(
...     date(2023, 1, 1), date(2023, 5, 1), "1mo", eager=True
... ).dt.month_end()

You'll probably find more refined logic by looking at the blackdoc source code, though.

One question that comes to mind is what do with code that has syntax errors but the easiest is to just leave such code unchanged.

I would imagine code examples with syntax errors raise an error, and would lead to that specific example to not be autofixed.

MichaReiser commented 1 year ago

I think Examples blocks can already be detected as there are some docstring formatting rules in pydocstyle related to sections.

I guess this brings up the question whether this should be a lint rule with an autofix or integrated into the formatter (runs automatically on save)? I assumed that this would be similar to the linked project and be part of the formatter.

The big challenge will be to implement fast parsing and integrate it in the otherwise already extensive docstring formatting

https://github.com/astral-sh/ruff/blob/c05e4628b1d385d106e7e3d355f5d1d76fbfe401/crates/ruff_python_formatter/src/expression/string.rs#L787-L905

I would imagine code examples with syntax errors raise an error, and would lead to that specific example to not be autofixed.

That's my intuition but the question is would the rest of the file still be formatted? I would say yes, but the formatter then needs a way to raise a warning which we lack today.

stinodego commented 1 year ago

I guess this brings up the question whether this should be a lint rule with an autofix or integrated into the formatter (runs automatically on save)?

I would imagine this would have to be part of the autoformatter.

That's my intuition but the question is would the rest of the file still be formatted?

Yes, definitely. Other code examples in the same docstring could probably still be formatted, even. Each example is a self-contained Python interpreter command.

BurntSushi commented 11 months ago

@stinodego Out of curiosity, was there anything that pushed you to use blackdoc instead of blacken-docs? Also, do you use blackdoc to only reformat Python code in doc comments, or do you also use it to reformat Python code in other files (like, say, a README.rst or README.md)?

stinodego commented 11 months ago

I didn't pick blackdoc myself, it was introduced to the project at some point by someone else and I never thought to try other tools. I had heard of blacken-docs but have never tried it myself. It seems to do more than I need.

We currently only format Python in docstring examples. We don't autoformat code in our README, and code in our other docs is magically imported from Python files, so we can use ruff autoformatter for those.

BurntSushi commented 11 months ago

I spent some time looking into this and I think I've come up with a rough proposal on how to move this forward.

Straw implementation proposal

Tactically, my plan is to follow @MichaReiser's suggestion above and add this as opt-in functionality to the formatter in format_docstring. My initial plan is to support Markdown and reStructuredText formatted doc strings, and within each, to support both python and pycon code blocks. I'm proposing opt-in in at least the initial release as a precautionary step.

For example, all the following Python snippets have doc strings that would be reformatted.

Markdown docstrings with Python snippets:

def foo():
    '''
    Docs about foo.

    ```python
    def quux():
        x = do_something(foo())
        return x + 1

'''
pass


Markdown docstrings with pycon ("Python console") snippets:

````python
def foo():
    '''
    Docs about foo.

    ```pycon
    >>> def quux():
    ...     x = do_something(foo())
    ...     return x + 1

'''
pass


reStructuredText docstrings with Python snippets:

```python
def foo():
    '''
    Docs about foo.

    .. code-block:: python
        def quux():
            x = do_something(foo())
            return x + 1
    '''
    pass

reStructuredText docstrings with pycon ("Python console") snippets:

def foo():
    '''
    Docs about foo.

    .. code-block:: pycon

        >>> def quux():
        ...     x = do_something(foo())
        ...     return x + 1
    '''
    pass

It's plausible we'll want to support more than this, perhaps even in the initial implementation. But I'm unsure. I included more details about possible extensions below.

Prior art

I believe the main prior art in this space is blacken-docs and blackdoc.

It looks like blacken-docs and blackdoc take similar approaches, but blackdoc looks a fair bit more sophisticated. Its initial example shows that it can handle nested docstrings (something that blacken-docs cannot do). blackdoc does still use regexes, but also appears to use Python's standard library tokenize module for tokenizing Python code.

Testing plan

In addition to the usual snapshot tests we can write by hand, I think another effective way to test this is to try it on real code bases. Django, for example, comes to mind. I'll also look for other projects using blacken-docs to run this on and see what the diff looks like. That experience should then lead to adding more unit/regression tests.

Django ran into some interesting challenges when switching over to blackend-docs. I think the biggest issue was support for rST literals, which I address briefly below.

We should also be mindful of transformations that could make the overall Python code invalid. For example, if code snippets contain doc strings, and the code snippet is reformatted to use the same docstring quote style as the docstring containing the snippet, then that could produce an invalid result. In that particular case, I think we can solve it by using the opposite quote-style as the parent docstring, but even that isn't bullet proof. In theory, the docstring within the docstring might contain a code snippet itself. As a practical restriction, we probably want to introduce some way of ensuring that only "top-level" code snippets are reformatted. (Perhaps this could be done by introducing some state in the Python formatted context?)

Maybe should be in scope

reStructuredText literal blocks

We will probably also want to support plain literal blocks. The Django project insisted on it when adopting blacken-docs. So for example:

def foo():
    '''
    Docs about foo.

    Here's an example that uses foo::

        def quux():
            x = do_something(foo())
            return x + 1
    '''
    pass

The above represents a literal block where in theory any kind of "code-like" text can go. But apparently it's an old convention to treat it as Python code by default. I suspect that if we do the first four, then doing the last literal piece won't be too hard. We could even use Ruff's heuristics for detecting whether a snippet of code is actually Python or not. It's not clear how much of this should be in the initial implementation though.

Plain doctests

I went and looked through CPython and Django source code to get a "sense" of what some code snippets actually look like. It turns out that many code snippets are just implicit "pycon" style without any reStructuredText code block (or Markdown) syntax. For example:

def _property_resolver(arg):
    """
    When arg is convertible to float, behave like operator.itemgetter(arg)
    Otherwise, chain __getitem__() and getattr().

    >>> _property_resolver(1)('abc')
    'b'
    >>> _property_resolver('1')('abc')
    Traceback (most recent call last):
    ...
    TypeError: string indices must be integers
    >>> class Foo:
    ...     a = 42
    ...     b = 3.14
    ...     c = 'Hey!'
    >>> _property_resolver('b')(Foo())
    3.14
    """
    pass

This doesn't look like a one-off to me. They appear quite commonplace. It also looks like blackdoc supports formatting these "plain" doctests.

Not in scope

A way to toggle formatting of individual code blocks

This is a feature requested in blacken-docs too. It doesn't look like they've settled on method of doing this, and I think choosing a method will require some careful thought. Namely, you really don't want to require something in the code snippet itself because then it clutters up your example code in the docs. So you probably need something in the docstring itself or elsewhere. The best way to do this seems unclear enough that it's probably worth not including in scope for now.

Formatting Python code in other formats

This means you won't be able to run the formatter over a .rst or a .md file and have it format any Python code blocks in it. This initial implementation will only support reformatting Python snippets within docstrings inside Python files. We can consider formatting Python code in other formats as a future project.

Avoid Markdown/reStructuredText parsing

I think that trying to parse docstrings as Markdown or reStructuredText documents should be out of scope for now. If this turns out to be required (it shouldn't be), then we should re-evaluate at that point. The reason for this is that it likely complicates the implementation and could introduce performance regressions (although it's likely any perf problems could be mitigated with some care).

With that said, I do think this means that our detection logic will necessarily be a little ad hoc. This means it could do some incorrect things like attempt to format blocks that aren't actually Python. This is partially why I'm proposing making this opt-in, at least initially.

stinodego commented 11 months ago

Great to see this become more concrete!

For our specific use case, the "plain doctests" you mention are most important. We follow the numpy docstring standard, and most of our docstrings look something like:

def cool_stuff(arg):
    """
    Do cool stuff.

    Parameters
    ----------
    arg
        Some description.

    Examples
    --------
    Cool stuff with an integer.

    >>> cool_stuff(1)
    2

    Cool stuff with a string.

    >>> input = "q"
    >>> cool_stuff(input)
    'x'
    """
    pass

Reference to the NumPy doc guidelines: https://numpydoc.readthedocs.io/en/latest/format.html#examples

Example file from the Polars code base with lots of docstrings with example sections: https://github.com/pola-rs/polars/blob/main/py-polars/polars/expr/string.py

konstin commented 11 months ago

As a practical restriction, we probably want to introduce some way of ensuring that only "top-level" code snippets are reformatted. (Perhaps this could be done by introducing some state in the Python formatted context?)

Could you repurpose the format options for this case, setting the format docstring code option to false when we recurse?

BurntSushi commented 10 months ago

@stinodego Ah that's useful feedback, thank you! I think that means it makes sense to have plain doctests in the initial scope.

@konstin Yeah that sounds like a good idea. :-)

BurntSushi commented 10 months ago

@stinodego So I ran the docstring code formatter (from #8811) on polars, and here's the diff I got: https://github.com/BurntSushi/polars/commit/559b9d683bce3d4fd48d7946359fbdaeea1afccc

Most things seem to be unwrapping wrapped lines. Thoughts?

BurntSushi commented 10 months ago

Re-opening because I think #8811 doesn't actually close this. Probably once #8854 is merged, then it will actually be usable by end users in the following release.

BurntSushi commented 10 months ago

I've created some finer grained issues for tracking additional work here:

stinodego commented 10 months ago

@BurntSushi Thanks so much for your work - this is great! Looking forward to putting this into practice soon.

About the diff you posted:

It doesn't seem to respect line length in the same way, as you've noticed. We have all line length limits set to 88. I would love it if the doctest pragma comments (e.g. # doctest: +SKIP) would not be counted towards line length.
It trims trailing lines with ... sometimes. These lines are not very informative but they do in fact show up when you use the Python console, e.g. if you have a multi-line command, you have to press enter twice before the example runs. So these are a valid part of the example.

I think that accounts for most/all of the diff!

BurntSushi commented 10 months ago

It trims trailing lines with ... sometimes. These lines are not very informative but they do in fact show up when you use the Python console, e.g. if you have a multi-line command, you have to press enter twice before the example runs. So these are a valid part of the example.

Yeah I came across this during my work and it seemed like trimming the ... was most consistent with auto-formatting since it reflects an extra blank line at the end of the code. If it were "real" code, that line would be trimmed. I'm unsure of introducing specific rules that are different about formatting code snippets than outside of docstrings. Maybe @MichaReiser has an opinion here though?

stinodego commented 10 months ago

I don't feel very strongly either way about the trailing ....

Seems like it occurs with indentation differences, so when defining functions or using a context manager. Those are relatively rare, at least in our docs.

I'd be fine with either pruning the trailing ... or enforcing it. Pruning saves some lines, enforcing makes it closer to how the example would look in the actual Python console.

The line length differences is a more fundamental issue - I commented my thoughts on the separate issue you opened.

MichaReiser commented 10 months ago

Yeah I came across this during my work and it seemed like trimming the ... was most consistent with auto-formatting since it reflects an extra blank line at the end of the code. If it were "real" code, that line would be trimmed. I'm unsure of introducing specific rules that are different about formatting code snippets than outside of docstrings. Maybe @MichaReiser has an opinion here though?

I don't have any experience writing examples in Python myself. So I can't make a recommendation on whether we should enforce the extra line or not.

Implementation wise I see two options for enforcing the extra line and I'm fine with either, although I would probably prefer 1. because the logic may not apply to other example formats and is closer to the problem its solving.

We insert the extra empty line as part of the docstring example logic that @BurntSushi added. We can just always append it to the formatted string (or am I overlooking something?)
We can add a special case handling in the module formatting to emit an extra newline if the embedded_snipped (I don't remember the precise field name) in the PyFormatContext is Some.

ofek commented 10 months ago

cc @pawamoy for visibility as the person at the forefront of docstrings and documentation generation

edit: you might want to take a look at this as well https://github.com/astral-sh/ruff/issues/8855

BurntSushi commented 10 months ago

To make sure we don't lose track of the decision about the trailing empty line or not, I created an issue about it: https://github.com/astral-sh/ruff/issues/8908

@MichaReiser Aye. I think we can't unconditionally add a trailing line, so we'll need a heuristic, but I also favor that approach if we decide to go that route. And in particular, if we do need a heuristic, then we probably have to go route (1) since we'll probably want to base the decision on things like "were there any other non-empty PS2 prompt lines."

astral-sh / ruff