acbull / pyHGT

Code for "Heterogeneous Graph Transformer" (WWW'20), which is based on pytorch_geometric
MIT License
775 stars 162 forks source link

no thorough typing in methods and too many magic values #59

Open RLogik opened 1 year ago

RLogik commented 1 year ago
  1. Could you please add complete typing to (at least all externally visible) methods (both for input arguments and return types)? Without these it is hard to trust the code basis or use it reliably.

  2. Instead of the magic values you use for nodes, tasks, etc. could you please use Enums (or at the very least constants)? If you use enums, then you can add clear types to input arguments in all of your methods and you can typify objects that have these values as their keys. That way your code will be more readable/usable, and the intention of each variable, input argument, key, etc. will be clearer for outside developers.

Here is an example of an implementation mit enums

from enum import Enum

class NodeType(Enum):
    PAPER = 'paper'
    VENUE = 'venue'
    FIELD = 'field'
    # ... etc.

class TaskType(Enum):
    TRAIN = 'train'
    REDUCE_VARIANCE = 'reduce-variance'
    SEQUENTIAL = 'sequential'
    # ... etc.

class RelationType(Enum):
    TO = 'to'
    # ... etc.

class SomeClass:
    weights: dict[tuple[NodeType, str, RelationType, NodeType, str], int]
    ...
    def add_edge(
        self,
        source_node: NodeType,
        source_id: str,
        reln: RelationType,
        target_node: NodeType,
        target_id: str,
        weight: int,
    ):
        self.weights[
            (source_node, source_id, reln, target_node, target_id)
        ] = weight

def some_other_method(
    task: TaskType,
    batch_size: int = 50,
):
    # instead of if-elif-...-else
    match task:
        case TaskType.TRAIN:
            ...
        case TaskType.REDUCE_VARIANCE:
            ...
        case TaskType.SEQUENTIAL:
            ...
        case _: # default case
            ...

One might think that the enum is just a redundant duplicate of the hardcoded values, but this is not the case as one cannot add types to methods for hardcoded values and convey the desired intended input/output arguments. For even if, for example, you added types to argument, e.g.

def add_edge(
    self,
    source_node: str,
    source_id: str,
    reln: str,
    target_node: str,
    target_id: str,
    weight: int,
):

the code falsely suggests that any values can be used e.g. for source_node, whereas this is not the case—only a fixed small finite number of options should be used here ('paper', 'venue', etc.).

One can theoretically also set the values of the enums to integers instead of these being duplicates of the names.