catalystneuro / roiextractors

Python-based module for extracting from, converting between, and handling optical imaging data from several file formats. Inspired by SpikeInterface.
https://roiextractors.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
12 stars 7 forks source link

Docstring checker #237

Closed pauladkisson closed 1 year ago

pauladkisson commented 1 year ago

Adds test_docstrings.py to tests directory to traverse through all modules, classes, methods, and functions to ensure that they have docstrings (skips dataclass methods).

CodyCBakerPhD commented 1 year ago

Here was my implementation and comparison script for reference and comparison (both must be run in the same environment for the module methods to have the same address in memory)

(2) is your implementation (1) is mine

Inclusion of top-level package __init__, magic methods overridden in custom classes, and custom __dir__ masking are all up for debate ATM

import inspect
import os
import importlib
from pathlib import Path
from types import ModuleType, FunctionType
from typing import List, Iterable

import nwbinspector

def traverse_class(cls, objs):
    """Traverse a class and its methods and append them to objs."""
    predicate = lambda x: inspect.isfunction(x) or inspect.ismethod(x)
    for name, obj in inspect.getmembers(cls, predicate=predicate):
        objs.append(obj)

def traverse_module(module, objs):
    """Traverse all classes and functions in a module and append them to objs."""
    objs.append(module)
    predicate = lambda x: inspect.isclass(x) or inspect.isfunction(x) or inspect.ismethod(x)
    for name, obj in inspect.getmembers(module, predicate=predicate):
        parent_package = obj.__module__.split(".")[0]
        if parent_package != "nwbinspector":  # avoid traversing external dependencies
            continue
        objs.append(obj)
        if inspect.isclass(obj):
            traverse_class(obj, objs)

def traverse_package(package, objs):
    """Traverse all modules and subpackages in a package to append all members to objs."""
    for child in os.listdir(package.__path__[0]):
        if child.startswith(".") or child == "__pycache__":
            continue
        elif child.endswith(".py"):
            module_name = child[:-3]
            module = importlib.import_module(f"{package.__name__}.{module_name}")
            traverse_module(module, objs)
        elif Path(child).is_dir():  # subpackage - I did change this one line b/c error otherwise when hit a .json
            subpackage = importlib.import_module(f"{package.__name__}.{child}")
            traverse_package(subpackage, objs)

def traverse_class_2(class_object: type, parent: str) -> List[FunctionType]:
    """Traverse the class dictionary and return the methods overridden by this module."""
    class_functions = list()
    for attribute_name, attribute_value in class_object.__dict__.items():
        if isinstance(attribute_value, FunctionType) and attribute_value.__module__.startswith(parent):
            class_functions.append(attribute_value)
    return class_functions

def traverse_module_2(module: ModuleType, parent: str) -> Iterable[FunctionType]:
    """Traverse the module directory and return all submodules, classes, and functions defined along the way."""
    local_modules_classes_and_functions = list()

    for name in dir(module):
        if name.startswith("__") and name.endswith("__"):  # skip all magic methods
            continue

        object_ = getattr(module, name)

        if isinstance(object_, ModuleType) and object_.__package__.startswith(parent):
            submodule = object_

            submodule_functions = traverse_module_2(module=submodule, parent=parent)

            local_modules_classes_and_functions.append(submodule)
            local_modules_classes_and_functions.extend(submodule_functions)
        elif isinstance(object_, type) and object_.__module__.startswith(parent):  # class
            class_object = object_

            class_functions = traverse_class_2(class_object=class_object, parent=parent)

            local_modules_classes_and_functions.append(class_object)
            local_modules_classes_and_functions.extend(class_functions)
        elif isinstance(object_, FunctionType) and object_.__module__.startswith(parent):
            function = object_

            local_modules_classes_and_functions.append(function)

    return local_modules_classes_and_functions

list_1 = list()
traverse_package(package=nwbinspector, objs=list_1)

list_2 = traverse_module_2(module=nwbinspector, parent="nwbinspector")

# Analyze and compare - note that for set comparison, the lists must have been run in the same kernel
# to give all imports the same address in memory
unique_list_1 = set(list_1)
unique_list_2 = set(list_2)

found_by_2_and_not_by_1 = unique_list_2 - unique_list_1

# Summary: A series of nested submodules under `checks` and `tools`; some various private functions scattered around
# not really clear why Paul's missed these

found_by_1_and_not_by_2 = unique_list_1 - unique_list_2

# Summary: All of these are bound methods of the Enum's (Importance/Severity) or JSONEncoder
# and are not methods that we actually override in the codebase (they strictly inherit)
# It did, however, find the outermost package __init__ (does that really need a docstring though?)
CodyCBakerPhD commented 1 year ago

It occurred to me as per that realization with Heberto on the other PR - when we actually check whether or not the detected method/class/module has a docstring, we actually want to check if any of its parents have docstrings too

pauladkisson commented 1 year ago

It occurred to me as per that realization with Heberto on the other PR - when we actually check whether or not the detected method/class/module has a docstring, we actually want to check if any of its parents have docstrings too

inspect.getdoc actually already takes care of that (see docs)

CodyCBakerPhD commented 1 year ago

inspect.getdoc actually already takes care of that (see docs)

Very nice!

pauladkisson commented 1 year ago

Ok, since Cody's implementation is a bit cleaner and catches a few missed functions, I think that's what we should proceed with.

I am skipping all of the magic methods for now, maybe we could include them, but it's probably simplest not to.

I did make some minor tweaks to the organization of the functions (separated traverse_module into traverse_package and traverse_module) based on commonly used definitions of packages and modules in python see here.

Once we're satisfied with this test we can remove the compare_docstring_testers.py script and merge it in!

CodyCBakerPhD commented 1 year ago

The test file is now super clean, readable, and simple!

Next step, can you move it to a 'common_workflows' folder on https://github.com/catalystneuro/.github?

Also there's a merge conflict here now; I'd also remove the comparison script before merge

pauladkisson commented 1 year ago

Next step, can you move it to a 'common_workflows' folder on https://github.com/catalystneuro/.github? Also there's a merge conflict here now; I'd also remove the comparison script before merge

Done and done!