ChEB-AI / python-chebai

GNU Affero General Public License v3.0
12 stars 4 forks source link

`term_callback` should check for obsolete ChEBI classes #49

Closed sfluegel05 closed 1 month ago

sfluegel05 commented 2 months ago

Test Case Failing for term_callback

A test case for term_callback is failing because it is not correctly ignoring/skipping obsolete ChEBI terms. As a result, the test cases for _extract_class_hierarchy and _graph_to_raw_dataset are also failing as output of term_callback are used by them.

Current Behavior:

Potential Future Issue:

Example of a Problematic Obsolete Term:

[Term]
id: CHEBI:77533
name: Compound G
is_a: CHEBI:99999
property_value: http://purl.obolibrary.org/obo/chebi/smiles "C1=C1Br" xsd:string
is_obsolete: true

If terms like this exist in future releases, the current approach could lead to errors because obsolete terms with SMILES strings might slip through the filters.

Proposed Solution: We can update the term_callback logic to explicitly ignore obsolete terms by checking for the is_obsolete clause:

if isinstance(clause, fastobo.term.IsObsoleteClause):
    if clause.obsolete:
        # If the term document contains an "obsolete: true" clause, skip this term.
        return False

This solution would ensure that obsolete terms are skipped before they are processed, preventing potential future issues with the dataset.

Originally posted by @aditya0by0 in https://github.com/ChEB-AI/python-chebai/issues/48#issuecomment-2332645174