ChEB-AI / python-chebai

GNU Affero General Public License v3.0
12 stars 4 forks source link

Investigate Whether `label_number` is Used or Obsolete in ChEBI Dataset Class #47

Open aditya0by0 opened 2 months ago

aditya0by0 commented 2 months ago

Description

We need to understand whether the label_number property in the ChEBIOverX class and its derivatives (ChEBIOver100, ChEBIOver50) is actually used anywhere in the dataset creation process or pipeline.

  1. Is label_number referenced or used in any part of the codebase, including dataset creation, processing, or any downstream tasks?

    • If used, what purpose does it serve?
  2. If label_number is not used, can it be considered obsolete, and should it be removed to clean up the codebase?

  3. Assess whether retaining label_number serves any potential future purpose, or if it should be refactored.

Relevant Code


class ChEBIOverX(_ChEBIDataExtractor):
    LABEL_INDEX: int = 3
    SMILES_INDEX: int = 2
    READER: dr.ChemDataReader = dr.ChemDataReader
    THRESHOLD: int = None

    @property
    def label_number(self) -> int:
        return 854

    def select_classes(self, g: nx.Graph, split_name: str, *args, **kwargs) -> List:
        # Implementation...
        return nodes

class ChEBIOver100(ChEBIOverX):
    THRESHOLD: int = 100

    def label_number(self) -> int:
        return 854

class ChEBIOver50(ChEBIOverX):
    THRESHOLD: int = 50

    def label_number(self) -> int:
        return 1332
sfluegel05 commented 2 months ago

As far as I am aware, this property is obsolete. @MGlauer You probably added it - do you know what the purpose was?