PyCQA / docformatter

Formats docstrings to follow PEP 257
https://pypi.python.org/pypi/docformatter
MIT License
524 stars 61 forks source link

I believe URL_REGEX is missing a `|` #269

Open EFord36 opened 8 months ago

EFord36 commented 8 months ago

comparing the 'URL_REGEX explanation comment':

# (__ |`{{2}}|`\w[\w. :\n]*|\.\. _?[\w. :]+|')? is used to find in-line links that

to the actual first line of the regex:

    rf"(__ |`{{2}}|`\w[\w :#\n]*[.|\.\. _?[\w. :]+|')?<?"

the actual regex seems to be 'missing' a | between:

`\w[\w :#\n]*

and

[.|\.\. _?[\w. :]+

This seems to have been removed in https://github.com/PyCQA/docformatter/pull/213 , which doesn't seem intentional to me, as it doesn't seem obviously related to the change that PR was trying to fix (although it's possible I'm misunderstanding here, sorry if so!).

I think this does affect the wrapping behaviour - links in the format starting with .. seem to not get wrapped (although I'm not 100% sure I've understand the expected behaviour). A file with these contents:

def some_func():
    """This is a func.

    A description line.

    This is a very very very very very very very very very very very long line .. _a link reference:  https://domain.invalid/
    """
    pass

def another_func():
    """This is a func.

    A description line.

    .. _a link reference that is very very very very very very very very very very very very very ver longy:  https://domain.invalid/
    """
    pass

def some_func_other_link():
    """This is another func.

    This is a very very very very very very very very very very very long line `A link reference <https://domain.invalid/>`_
    """
    pass

produces this output when docformatter is run over it, only reformatting the link with a different format:

--- before/test_url_regex.py
+++ after/test_url_regex.py
@@ -19,7 +19,9 @@
 def some_func_other_link():
     """This is another func.

-    This is a very very very very very very very very very very very long line `A link reference <https://domain.invalid/>`_
+    This is a very very very very very very very very very very very
+    long line
+    `A link reference <https://domain.invalid/>`_
     """
     pass

I think maybe this isn't obvious/impactful because this kind of link is 'explicit markup' that according to ReST can only start a line of ReST, so it's probably infrequent that you get a link and reference long enough that this would reformat it anyway.

EFord36 commented 8 months ago

By the way, I noticed this because I was looking at the big comment explaining URL_REGEX, and though "maybe this is a good use case for re.verbose", which I think is a very useful but seemingly not especially well known feature of the re module that helps document complex regular expressions - I thought maybe it would help stop the comment explaining the different bits of the regex from falling out of sync with the actual regex - only to find this issue, where this seems to have actually happened.

If (after this issue is resolved) I submitted a PR to try writing a 'verbose' version of this regex with explanation 'inline', is this something you would be interested in? could probably be 'compiled' with re.compile to prevent re-parsing of the