ChEB-AI / python-chebai

GNU Affero General Public License v3.0
12 stars 4 forks source link

Refactor `_extract_class_hierarchy` Method in `ChEBIOverXPartial` for Data preprocessing #50

Closed aditya0by0 closed 1 month ago

aditya0by0 commented 2 months ago

Refactor _extract_class_hierarchy Method in ChEBIOverXPartial for Data preprocessing

We can refactor of the _extract_class_hierarchy method in the subclass to leverage the existing functionality from the superclass. This change will improve code reuse and maintainability.

Current Implementation

The current implementation in the subclass duplicates the logic for extracting the class hierarchy from the ChEBI ontology and then filters the graph to include only the subclasses of the top class ID.

Method Implementation in _ChEBIDataExtractor

Method Implementation in ChEBIOverXPartial

Proposed Change

We can simplify the subclass method by calling the superclass method to extract the full class hierarchy and then filter the graph to include only the descendants of self.top_class_id. Since the graph is already transitively closed, we can use g.successors directly instead of nx.descendants, which simplifies the filtering process. This approach ensures that we reuse the existing extraction logic and focus only on the specific filtering needed for the subclass.

Updated Method

Here's the proposed updated method for the subclass:

def _extract_class_hierarchy(self, chebi_path: str) -> nx.DiGraph:
    """
    Extracts a subset of ChEBI based on subclasses of the top class ID.

    This method calls the superclass method to extract the full class hierarchy, 
    then extracts the subgraph containing only the descendants of the top class ID.

    Args:
        chebi_path (str): The file path to the ChEBI ontology file.

    Returns:
        nx.DiGraph: The extracted class hierarchy as a directed graph, limited to the 
        descendants of the top class ID.
    """
    g = super()._extract_class_hierarchy(chebi_path)    
    g = g.subgraph(list(g.successors(self.top_class_id)) + [self.top_class_id])
    return g

Please review the proposed change and let me know if you have any feedback or concerns.