improve code style - Githubissues

mwermelinger commented 1 year ago

Even though pylint's score is > 9, it reports several issues: reusing identifiers in different scopes and check_tree has too many statements, benches and local variables. This will require breaking up the check into different functions.

densnow commented 1 year ago

Is the PEP 8 style guide the ideal we should be aiming for in this project? Should this be the document I turn to when considering all things style related? I noticed that a couple of doc-strings etc have been modified and I would like to get them right the first time, does this document guide your choices for that kind of thing?

mwermelinger commented 1 year ago

PEP 8 is for code and PEP 257 is about docstrings. The best is to use a tool like pylint or flake8 to check the style. Can't remember if either of them automatically uses pydocstyle.

densnow commented 1 year ago

Thanks I will certainly start using pylint etc. I guess I wanted to try to read about the appropriate style so I can try to get it right the first time and start building good habits! There seems to be two major style guides for Python: one is PEP and the other seems to be the google style guide.

I have been refactoring check_tree and will just post the new code here before PR to see if you would be happy with the style (it still needs a bit of work with doc strings and type annotations). Would you be happy with the nested functions? or would you prefer separate functions with more arguments passed? or another option would be to make it into a class? And it does contain lamdas, but IMHO this time they are a bit more readable, they can be taken out but more functions would need to be added. Anyway just thought I would see what you think (if I am on the right path) before doing anymore work on it.

def check_tree(
    tree: ast.AST,
    constructs: tuple,
    source: list,
    line_cell_map: list,
    errors: list
) -> None:
    """Check if tree only uses allowed constructs. Add violations to errors"""
    language, options, imports, functions, methods = constructs

    def location(line: int, line_cell_map: list) -> tuple[int, int]:
        """Return (0, line) if not a notebook, otherwise (cell, relative line)."""
        return line_cell_map[line] if line_cell_map else (0, line)

    def add_error(node: ast.AST, message: str) -> None:
        """add (cell, line, message) to errors"""
        cell, line = location(node.lineno, line_cell_map)
        errors.append((cell, line, message))

    def handle_op(node, ops) -> None:
        for op in ops:
            if not isinstance(op, language):
                message = CONCRETE.get(type(op), "unknown operator")
                add_error(node, message)

    def handle_import(node) -> None:
        for alias in node.names:
            if alias.name not in imports:
                message = f"import {alias.name}"
                add_error(node, message)

    def handle_import_from(node) -> None:
        if node.module not in imports:
            message = f"from {node.module} import ..."
            add_error(node, message)
        else:
            for alias in node.names:
                if alias.name not in imports[node.module]:
                    message = f"from {node.module} import {alias.name}"
                    add_error(alias, message)

    def handle_attribute(node) -> None:
        if isinstance(node.value, ast.Name):
            name = node.value
            attribute = node.attr
            if name.id in imports and attribute not in imports[name.id]:
                message = f"{name.id}.{attribute}"
                add_error(node, message)
            elif hasattr(name, "resolved_annotation"):
                type_name = re.match(r"[a-zA-Z.]*", name.resolved_annotation).group()
                if type_name in methods and attribute not in methods[type_name]:
                    message = f"method {attribute} called on {type_name.lower()} {name.id}"
                    add_error(node, message)

    def handle_call(node) -> None:
        if isinstance(node.func, ast.Name):
            function = node.func.id
            if function in BUILTINS and function not in functions:
                message = f"built-in function {function}()"
                add_error(node, message)

    def handle_loop_else(node, loop_type: str) -> None:
        if node.orelse and loop_type not in options:
            message = f"else in {loop_type}-loop"
            add_error(node.orelse[0], message)

    node_handlers = {
        ast.BinOp: lambda node: handle_op(node, [node.op]),
        ast.UnaryOp: lambda node: handle_op(node, [node.op]),
        ast.BoolOp: lambda node: handle_op(node, node.values),
        ast.Compare: lambda node: handle_op(node, node.ops),
        ast.Import: handle_import,
        ast.ImportFrom: handle_import_from,
        ast.Attribute: handle_attribute,
        ast.Call: handle_call,
        ast.For: lambda node: handle_loop_else(node, "for"),
        ast.While: lambda node: handle_loop_else(node, "while"),
    }

    for node in ast.walk(tree):
        if isinstance(node, NO_LINE):
            continue

        handler = node_handlers.get(type(node))
        if handler:
            handler(node)
        elif not isinstance(node, language):
            if hasattr(node, "lineno"):
                message = source[node.lineno - 1].strip()
            else:
                message = f"unknown construct {str(node)} at unknown line"
            add_error(node, message)

mwermelinger commented 1 year ago

Thanks for this. Internal functions have many global variables that can trip one up when reading the code. External functions will have more arguments. Maybe a class is the way to go with one public method check and several private methods _check_op etc. The location and add_error methods can probably be merged into a single private one.

I think your else loop handling needs fixing, because the OPTIONS have "for else" and "while else".

The main loop could become

if isinstance(node, NO_LINE):
    pass
elif handler := node_handlers.get(type(node)):
    handler(node)
elif not ...

However, I wonder if this will be too many changes for just removing two pylint messages, i.e. low ROI. Maybe better to do all the refactoring after the tool can be installed with pip and it doesn't matter whether the tool is made of 1 or 10 files.

densnow commented 1 year ago

Good spot with handle_loop_else probably could be fixed with something simple like

def handle_loop_else(node, loop_type: str) -> None:
        if node.orelse and loop_type + " else" not in options:
            message = f"else in {loop_type}-loop"
            add_error(node.orelse[0], message)

Yes, waiting until #11 is completed is probably the best course of action now you mention it. I am assuming that we can also remove a lot of the import conditionals once pip is up and running. In a way it is low ROI, but the returns I am getting in education are quite high :smile:

dsa-ou / allowed

improve code style #22