choosehappy / HistoQC

HistoQC is an open-source quality control tool for digital pathology slides
BSD 3-Clause Clear License
267 stars 105 forks source link

Refactor BaseImage Class to support reading multiple WSI format such DICOM By using Abstract Base Class #252

Open nanli-emory opened 1 year ago

nanli-emory commented 1 year ago

Description

Currently, HistoQC is using OpenSlide as WSI reader to manipulate whole-slide images. Unfortunately, OpenSlide is not really maintained anymore. It makes HistoQC only supports older file formats such svs, ndpi, big-tiff and can't handle the newer file formats such as DICOM, philips and new 3dhistec versions because of openslide.

Issues & Potential solutions

  1. DICOM support -> use pydicom + wsidicom to create a new dicom reader
  2. the newer file formats support -> use bioformats to create a new bioformat reader
  3. sustainable support of the latest WSI formats -> refactor BaseImage and create an abstract reader by using Abstract Base Class (ABC)
  4. Bounding Box? (HistoQC support bounding box but I'm not sure if other SWI format has it or not)

General Class Diagram

classDiagram

    class BaseImage{
        -SWIImageReader reader
        -BoundingBox bbox
        -[int] dimensions

        -List levels
        -String magnification
        -int level_count
        -List~int~ level_downsamples
        -List~int~ level_dimensions
        +getBoundingBox()
        +GetThumbnail()
        +getTheBestThumbnail()
        +getTheBestLevelForDownsample()
        +readRegion()
    }
    class BoundingBox{
        +int x
        +int y
        +int width
        +int height

    }
    class Config{
        +bool enableBoundingBox
        +String imageWorkSize
        +enum maskStatistics
        +bool enableBoundingBox
    }
    class MaskStatistics{
        <<enumeration>>
        relative2mask
        absolute
        relative2image
    }
    class WSIImageReader~ABC~{
        <<abstract>>
        +getLevels()
        +getDimensions()
        +getLevelDimensions()
        +getLevelDownsamples()
        +getThumbnail()
        +getRegion()
    }

    class DICOMReader~WSIImageReader~{

    }

    class OpenSlideReader~WSIImageReader~{

    }

    class BioformatReader~WSIImageReader~{

    }
    BaseImage *-- WSIImageReader
    BaseImage *-- BoundingBox
    BaseImage *-- Config
    Config *-- MaskStatistics
    WSIImageReader <|-- DICOMReader
    WSIImageReader <|-- OpenSlideReader
    WSIImageReader <|-- BioformatReader

Readers Dependency

Reader Libraries Dependency
OpenSlideReader OpenSlide ???
DICOMReader wsidicom, pydicom numpy, Pillow
BioformatReader python-bioformats javabridge, JVM
choosehappy commented 1 year ago

Yup, this looks to be inline with what i was thinking, thanks! top priority is the dicom component (using wsidicom), and then we can regroup on the other components

CielAl commented 1 year ago

Yup, this looks to be inline with what i was thinking, thanks! top priority is the dicom component (using wsidicom), and then we can regroup on the other components

Might be similar to https://github.com/choosehappy/HistoQC/pull/221 (fork deleted sadly). image

Since libraries like wsidicomizer and TiffSlide mostly mimic the interface of openslide, it is possible to simply encapsulate the osh object (currently the openslide handle) as a into a base class while providing a unified set of interfaces for methods such as read_region, and provide a factory method to instantiate the handle correspondingly. This way, the modification of class BaseImage and other modules could be minimized (the osh is deeply coupled within most of the qc modules).

Note that how meta info is stored may be different across each library (for slides that are not supported by openslide) and therefore it may not be trivial to simply implement osh.properties to adapt functions like getMag.

jacksonjacobs1 commented 1 year ago

Openslide has now incorporated DICOM support: https://openslide.org/news/.

Let's keep an eye on this. The latest openslide version is currently available via ppa repository only. Once the ubuntu repository is updated, consider allowing openslide to handle dicom images natively instead of using wsidicom with a custom DICOM handle.

CielAl commented 1 year ago

Openslide has now incorporated DICOM support: https://openslide.org/news/.

Let's keep an eye on this. The latest openslide version is currently available via ppa repository only. Once the ubuntu repository is updated, consider allowing openslide to handle dicom images natively instead of using wsidicom with a custom DICOM handle.

Customization of cache is also intriguing as it directly affects how fast functions like read_region can perform. But it may also be nice to make the choice of image backend optional similar to QuPath does - therefore users may choose based on what's the best for their own environment from as many options as possible, and it is always beneficial to remove the direct coupling between qc modules and openslide APIs anyway.

DanielaSchacherer commented 7 months ago

Hi everyone, while I was trying to use HistoQC on DICOM files, I stumbled across this Issue and was wondering what the current status is.

To my knowledge Openslide 4.0.0 and at least the latest version OpenSlide Python 1.3.1 can read DICOM files as stated here and here. I could only find an older version of OpenSlide Python in your code. Do you plan to update the version anytime soon or switch to bioformats/wsidicom at some point?

Best wishes, Daniela

CielAl commented 7 months ago

Hi everyone,

while I was trying to use HistoQC on DICOM files, I stumbled across this Issue and was wondering what the current status is.

To my knowledge Openslide 4.0.0 and at least the latest version OpenSlide Python 1.3.1 can read DICOM files as stated here and here.

I could only find an older version of OpenSlide Python in your code. Do you plan to update the version anytime soon or switch to bioformats/wsidicom at some point?

Best wishes,

Daniela

Update your openslide binary to 4.0.0 (if Windows then you also need to place the binaries in the bin folder we created under the histoqc path), and the openslide-python wrapper may work. New openslide does not break any back port compatibility iirc.