write script that reformats docstrings to numpy style

GavinHuttley commented 5 years ago

Original report by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

BEFORE writing anything, check whether somebody has already written a docstring reformatter that can be used to solve this issue. (If so, make a comment below.)

Many PyCogent3 docstrings have the format

function/method summary

Arguments:
   - param: description

That format needs to be refactored to the

function/method summary

Parameters
-------------
param
    description

style (line length <= 80 characters). (The complete numpy style reference is much more extensive than what we require I expect.

NOTE: This script will be a part of the pyco3dev repo NOT PyCogent3.

Make that script a command line installable. As an example, look at the pyco3dev/cleanup.py and associated entry in setup.py (the cleanup insertion point).

The script should:

identify functions/methods with docstrings
whether those match the format I've provided above
parse that format to extract the key components (summary, parameter names and descriptions)
track public methods / functions (no leading _) that do NOT have a docstring
Make sure it works on a few files, but don't commit any changes to the repo.

GavinHuttley commented 5 years ago

Original comment by T La (Bitbucket: 5c73c2a3cfbc206be909a286, ).

This project looks promising:

Pyment

It supports Numpy and PEP docstring conventions. It has a conversion feature, which I suspect might be what we are looking for.

GavinHuttley commented 5 years ago

Original comment by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

Looks good. Can you run that over the cogent3.util.transform file and post the patch here, annotate in markdown as a diff, i.e.

```diff

paste the diff

```

That file is now small and this will give me a chance to scrutinise the output.

GavinHuttley commented 5 years ago

Original comment by T La (Bitbucket: 5c73c2a3cfbc206be909a286, ).

# Patch generated by Pyment v0.3.3

--- a/transform.py
+++ b/transform.py
@@ -29,10 +29,23 @@

 def per_shortest(total, x, y):
     """Divides total by min(len(x), len(y)).
-
+    
     Useful for normalizing per-item results from sequences that are zipped
     together. Always returns 0 if one of the sequences is empty (to
     avoid divide by zero error).
+
+    Parameters
+    ----------
+    total :
+        
+    x :
+        
+    y :
+        
+
+    Returns
+    -------
+
     """
     shortest = min(len(x), len(y))
     if not shortest:
@@ -43,10 +56,23 @@

 def per_longest(total, x, y):
     """Divides total by max(len(x), len(y)).
-
+    
     Useful for normalizing per-item results from sequences that are zipped
     together. Always returns 0 if one of the sequences is empty (to
     avoid divide by zero error).
+
+    Parameters
+    ----------
+    total :
+        
+    x :
+        
+    y :
+        
+
+    Returns
+    -------
+
     """
     longest = max(len(x), len(y))
     if not longest:
@@ -57,17 +83,24 @@

 class for_seq(object):
     """Returns function that applies f(i,j) to i,j in zip(first, second).
-
+    
     f: f(i,j) applying to elements of the sequence.
-
+    
     aggregator: method to reduce the list of results to a scalar. Default: sum.
-
+    
     normalizer: f(total, i, j) that normalizes the total as a function of
     i and j. Default is length_normalizer (divides by the length of the shorter
     of i and j). If normalizer is None, no normalization is performed.
-
+    
     Will always truncate to length of the shorter sequence (because of the use
     of zip).
+
+    Parameters
+    ----------
+
+    Returns
+    -------
+
     """

     def __init__(self, f, aggregator=sum, normalizer=per_shortest):
@@ -88,9 +121,16 @@

 class KeepChars(object):
     """Returns a filter object o(s): call to return a filtered string.
-
+    
     Specifically, strips out everything in s that is not in keep.
     This filter is case sensitive by default.
+
+    Parameters
+    ----------
+
+    Returns
+    -------
+
     """
     allchars = bytes(range(256))

@@ -116,7 +156,19 @@

 def first_index_in_set(seq, items):
-    """Returns index of first occurrence of any of items in seq, or None."""
+    """Returns index of first occurrence of any of items in seq, or None.
+
+    Parameters
+    ----------
+    seq :
+        
+    items :
+        
+
+    Returns
+    -------
+
+    """
     for i, s in enumerate(seq):
         if s in items:
             return i

GavinHuttley commented 5 years ago

Original comment by T La (Bitbucket: 5c73c2a3cfbc206be909a286, ).

I notice that one of the conventions is violated: From numpy docs: "The colon must be preceded by a space, or omitted if the type is absent." There is no type, but the colon is not omitted.

GavinHuttley commented 5 years ago

Original comment by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

Does pyments allow some control on when it reformats and the nature of the reformat? (BTW hg revert path/to/file.py to get back to the original state.)

GavinHuttley commented 5 years ago

Original comment by T La (Bitbucket: 5c73c2a3cfbc206be909a286, ).

It does not allow the removal of colons from empty parameter descriptions. Its configs only allow for minimal customisation of formatting. However, we do have the source code at hand, and can easily modify it to suit our needs.

GavinHuttley commented 5 years ago

Original comment by T La (Bitbucket: 5c73c2a3cfbc206be909a286, ).

Update: I hacked into the pyment code and modified it so that the above violation of the numpydocs is fixed. That said, more testing is required to see if everything else is in working order.

GavinHuttley commented 5 years ago

Original comment by T La (Bitbucket: 5c73c2a3cfbc206be909a286, ).

Should I continue to develop the pyment hack, or what is the go ahead for this ticket? Applying the current state of the pyment hack will do what is asked in the task title, but I suspect there may be a little more to this task (e.g. automatic type descriptions, etc).

GavinHuttley commented 5 years ago

Original comment by GavinH (Bitbucket: 557058:e40c23e1-e273-4527-a2f8-5de5876e870d, ).

I hesitate here because the hack is not part of the standard pyments distribution, so is not a universally installable solution. Given that, we should probably hold off on applying it until we have purged other modules functions etc..

For now, can you attach a tarball of the revised pyments code to this ticket. Is the pyments usage modified over the unhacked version?

When that's done, I'll put this ticket to watch until we have progressed the other issues.

GavinHuttley commented 5 years ago

Original comment by T La (Bitbucket: 5c73c2a3cfbc206be909a286, ).

The hack does not change inputs (ie usage of pyment is the same) but it changes output. I’ll upload the tar when I get a chance.

GavinHuttley commented 5 years ago

Original comment by T La (Bitbucket: 5c73c2a3cfbc206be909a286, ).

Sorry for the delay! The tar is now on the repo!

GavinHuttley commented 5 years ago

Original comment by T La (Bitbucket: 5c73c2a3cfbc206be909a286, ).

waiting for other API changes

GavinHuttley commented 5 years ago

Original comment by T La (Bitbucket: 5c73c2a3cfbc206be909a286, ).

Pull request accepted on cogent3-dev:50058ef

cogent3 / c3dev

write script that reformats docstrings to numpy style #5