TheLion-ai / UMIE_datasets

Creative Commons Attribution 4.0 International
33 stars 1 forks source link

Add premade extracting methods for StudyIdExtractor #73

Open A-Huli opened 3 months ago

A-Huli commented 3 months ago

Description

Currently, if a pipeline requires a different extracting method than the one defined in base/, we need to rewrite the extract method from scratch. Many pipelines cannot tell if images come from the same study. In such cases, we use the source img id as the study id and we use 0.png as our img_id. Several pipelines redefine the _extract method to do it. We would like to have a set of private methods for image extractors that will allow us to reuse existing methods rather than the exact logic of several pipelines.

What to do

  1. Look at what extracting methods are defined across pipelines/.
  2. Create new private methods in BaseStudyIdExtractor that cover the logic required by the pipelines. e.g.
    def _extract_from_filename()
    return os.path.basename(path)
  3. Redefine instances of ImgIdExtractor across pipelines to use the new methods. e.g. pipelines/alzheimers.py

    class StudyIdExtractor(BaseImgIdExtractor):
    """Extractor for image IDs specific to the Alzheimer's dataset."""
    
    def _extract(self, img_path: str) -> str:
        return _extract_from_basename() 

Expected behaviour

BaseStudyId extractor has private methods for different extracting operations, e.g. extracting the filename, or returning "0.png".

Current behaviour

Each pipeline defines its own extracting method if it differs from the base. Many pipelines require the same method for extracting but each of them defines it separately.