choosehappy / HistoQC

HistoQC is an open-source quality control tool for digital pathology slides
BSD 3-Clause Clear License
253 stars 100 forks source link

Adding CUDA support in PR#287 #288

Open CielAl opened 3 months ago

CielAl commented 3 months ago

Hi Andrew and Nan, @choosehappy @nanli-emory

since I am tired of running 1k+ WSIs for some paper over and over I add the CUDA support (#287) into the WSIHandles using RapidsAI's cucim.CuImage, and here goes some thoughts, benchmarks, and todos:

Motivation and Feasibility (1) histoqc.wsihandle.CuImageHandle itself use cucim.clara.CuImage as the file handle, which intentionally mirrors a subset (and is expanding) of openslide's API as described in their documentation (under development). It uses cupy.ndarray, which mostly mirrors the numpy.ndarray. cuCIM also by itself provides a mirror of skimage for most of common operations (e.g., filters, resize, color transformation etc.): cucim.skimage. These are pretty much the prerequisites of introducing the GPU acceleration without breaking the interface and compatibility of our code and cuCIM seems to be a good candidate.

(2) HistoQC's pipeline is mainly bottlenecked by downsampling (thumbnails) and different sets of filtering options. The thumbnail extraction procedure is occasionally dominated by the read_region which decode the compressed data from tiled pyramid data. I did a few simple benchmarks and here is the result, using a 40x WSI with size 46407x53784.

(3) Since cupy.ndarray pretty much mirrors most of the essential API of numpy.ndarray, and the conversion between each other is trivial and simple, and there exists high-level function/transformations that shares the same interface (e.g., cupyx replaces scipy and cucim.skimage replaces skimage), this would also be esier to introduce GPU acceleration into custom modules with careful planning.

(4) We could definitely list cucim as an extra_requirement in toml and setup.py so it can be optional, without breaking the current pipeline.

Changes (1) We might need to explicitly define, differentiate and backend data type. For instance, openslide mostly returns PIL.Image.Image, and cucim.CuImage returns cucim.CuImage itself. Since cucim.CuImage is a centerpiece of most operations inside CuImageHandle yet not sharing any interfaces with PIL.Image.Image, a different layer of abstraction might be needed in WSIImageHandle to differentiate the different backend data type the handle returns for internal procedures and the unified output types to external procedures. Herein, a method of backend_region is added, which returns the raw output of reading a single region from the handle. While the read_region read the backend_region and explicitly unified them to PIL.Image.Image using the backend_to_pil.

(2) We might need to explicitly manage the array data type beneath the backend data, mostly for external transformations. The array data in many cases share the same memory of the backend data (e.g., the numpy.ndarray from the PIL.Image.Image and cupy.ndarray from cucim.CuImage) and is trivial to retrieve. This procedure is defined by backend_to_array method. The array data should share similar interfaces and methods that exposes and returns such array data would allow the introduction of GPU operations to external procedures.

For instance, getBestThumb returns np.ndarray for OpenslideHandle and cupy.ndarray for CuImageHandle, which allows downstream operations from external modules to identify whether GPU accelerated transformations should be applied.

(3) The resizeTileDownward and getBestThumb methods seem to be tightly coupled to the image handles rather than BaseImage, and with the modifications from (1) and (2) above, they are integrated into the WSIImageHandle as generic procedures. Users only need to override methods such as region_backend, backend_to_array, etc. without modifying the resizeTileDownward and getBestThumb themselves. (Benchmark was done via the integrated resizeTileDownward and getBestThumb.

TODO (1) Image operations with GPU acceleration in custom modules using cucim.skimage and cupyx. (2) Vendor info. So far cucim.CuImage simply lumps vendor-specific metadata (e.g., mpp, mag etc.) into its metadata field, and it lacks a unified parser to read these meta data into a map of common property names like what openslide is doing. Currently cuCIM does not offer more detailed documentation regarding the metadata and as a workaround, a dummy openslide.OpenSlide is associated into the CuImageHandle only for reading properties.

choosehappy commented 3 months ago

A pile of work here, very impressive!

My understanding was that Cucim previously only supported a couple of WSI formats - in particular those with flat tiff like structures. is that still the case? I'm personally interested in mrxs formats since there appear to be tons of these floating around, but i suspect some of the ndpi ones are also very widely used.

does Cucim now support them?

i'd point out as well that the latest version of openslide now perfectly supports dicom - which is a huge must have since this appears to be the format that folks will be using clinically going forward (to be compatible with their PACS systems). How would that format fit in here as well?

if those check out - last question, what do you imagine the integration to look like? supporting a CPU + GPU versions separately? there are many installations of histoqc which currently reside in CPU only environments, so we won't be able to eliminate that option likely ever. Would we instead have CuCim run on CPU? will there be a negative penalty for that?

adding @jacksonjacobs1 as senior developer

CielAl commented 3 months ago

A pile of work here, very impressive!

My understanding was that Cucim previously only supported a couple of WSI formats - in particular those with flat tiff like structures. is that still the case? I'm personally interested in mrxs formats since there appear to be tons of these floating around, but i suspect some of the ndpi ones are also very widely used.

does Cucim now support them?

i'd point out as well that the latest version of openslide now perfectly supports dicom - which is a huge must have since this appears to be the format that folks will be using clinically going forward (to be compatible with their PACS systems). How would that format fit in here as well?

if those check out - last question, what do you imagine the integration to look like? supporting a CPU + GPU versions separately? there are many installations of histoqc which currently reside in CPU only environments, so we won't be able to eliminate that option likely ever. Would we instead have CuCim run on CPU? will there be a negative penalty for that?

adding @jacksonjacobs1 as senior developer Hey thanks for the reply and here are my a few thoughts.

Format support and how does it work with current openslide-based implementation.

I looked into the documentation and as the time being cuCIM does support multi-resolutional tiled images but currently it doesn't mention the support of mrsx and here is the current list of supported format:

Given the current development status of cuCIM I would expect more updates in future and therefore it makes sense to me to simply list the package cucim as an extra requirements (so you install with smth like pip install histoqc[cucim]), and openslide is still the best option for the most generic support and compatibility. We'd also make CuImageHandle a developing branch or an experimental feature as a lot may depend on future updates of cuCIM. (A brief disgression here: since openslide 4.0.0 may not handle well on all BigTIFF typed images in certain cases, we might need a TiffSlide-based handle specifically handling special tiff formats)

Integration with CPU code bases

That will bring us to your final question: how do we do the integration. First, as you said, the majority still use CPU-only environment and we can't forsake that. Meanwhile, I think it is easy to handle both CPU and GPU without breaking the backport compatibility for the below reasons.

(1) We can hide as much detail of cuCIM-related code behind the unified interface of WSIImageHandle. The pipeline and custom modules might need be aware of whether an incoming data type is numpy.ndarray or cupy.ndarray but nothing more. In fact, most of the semantic coupling cuCIM introduced for now is that the GPU acceleration relies on cupy.ndarray rather than numpy.ndarray - an array interface but device/driver dependent.

That means, we need to clearly define and regulates the return type of different set of interfaces such that:

We currently integrate the getBestThumb and resizeTileDownward into WSIImageHandle, wherein the only thing differs are what data interface and resize functions were implemented by subclasses.

GPU Acceleration into Modules

Three important things to note are: (1) cucim library does not provide APIs that directly transform cucim.CuImage itself. Rather, it works on the cupy.ndarray you obtained from the cucim.CuImage. (2) cucim.CuImage supports CPU! And in fact depends on the device attribute you can obtain a numpy.ndarray or cupy.ndarray from it. (3) A significant bottleneck that throttles the speed in CPU is actually openslide.OpenSlide.read_region, especially in thumbnail computation (wherein a large and closest level is read and resized as a whole in most of cases).

Based on (1) and (2) we actually have two options:

Regardless of your options, you may need a dynamic import of function call or wrappers defined to handle numpy.ndarray and cupy.ndarray in module operations, and perhaps a Protocol class of array interface for typing purposes to unify numpy.ndarray and cupy.ndarray for more smooth coding.

I am inclined to Option 2 because

CielAl commented 2 months ago

@nanli-emory @choosehappy @jacksonjacobs1 Since dicom_support is no longer an active branch, I peel off the necessary changes, wrap up the ImageHandle abstraction and corresponding deployment in BaseImage and QC modules, merging all changes to the main (in #290).

Involvement of WSIHandle as a standalone attribute in BaseImage could be a first step to clean the current hard-to-refactor BaseImage (with its swamps of str keys and logging msgs), and would be beneficial to all future new program features.

A few points:

(1) Below I prepare a scratch of scripts to compare the difference of methods under skimage and cucim.skimage, excluding test packages and private/protected methods. cucim.skimage mirrors 266 methods of skimage while misses 366 methods. But the good news is nearly all methods we need for QC procedures are in the 266 implemented ones. A quick glance show that the 366 unimplemented methods are mostly either io/draw/viewer related, or merged to implemented ones (e.g., filters.rank merged into filters as a generic filter module). There are very few necessary methods that are currently not implemented, e.g., graycomatrix. That seems to make it feasible to simply implement unified functional interfaces for skimage/scipy/numpy operations based on runtime check of input array type (numpy.ndarray vs. cupy.ndarray). And if a particular operation does not benefit from GPU acceleration or the CUDA-counterpart is not implemented yet, we downgrade it to numpy.ndarray temporarily and call the corresponding CPU-based approaches.

(2) It seems like we should prepare to moveforward the major python version from 3.7--3.8 to 3.8--3.10, as more packages are migrated to python3.10 and 3.7 is retiring. A few skimage methods are also obsolete and might not return numerically accurate results (see the issue of frangi). This means we might need to consider upgrading the corresponding packages as well. Hence, The function calls/signature of certain skimage methods were updated so it will be ready for upgrading (remove deprecating signatures etc.)

(3) Since the image handle is coupled with nearly all custom QC modules, all these modules are revised for the change and implementation of WSIImageHandle, along with lots of cleanings. TBH the implementation of Modules and BaseImage itself is kinda dirty, especially when the getitem is overused in custom modules and it would be a burden for refactoring even with a modern IDE.

import importlib
import pkgutil
from types import ModuleType

def find_public_methods(module):
    return [name for name in dir(module) if not name.startswith('_') and callable(getattr(module, name))]

def method_report(full_method_name: str, method_name: str,
                  present_methods: list, missing_methods: list, misc_errors: list):
    # Construct corresponding cucim.skimage method name
    cucim_method_name = full_method_name.replace('skimage', 'cucim.skimage', 1)
    cucim_module_name = ".".join(cucim_method_name.split(".")[:-1])
    try:
        cucim_module = importlib.import_module(cucim_module_name)
        result = getattr(cucim_module, method_name)
        present_methods.append(cucim_method_name)
    except (ImportError, AttributeError):
        # If the method does not exist in cucim.skimage, add to missing_methods list
        missing_methods.append(full_method_name)
    except Exception as e:
        print(e, full_method_name)
        misc_errors.append(full_method_name)

def module_report(name: str, present_methods: list, missing_methods: list, missing_modules: list, misc_errors: list):
    try:
        submodule = importlib.import_module(name)
    except ImportError:
        missing_modules.append(name)
        return
    if isinstance(submodule, ModuleType):
        methods = find_public_methods(submodule)
        for method in methods:
            full_method_name = f"{name}.{method}"
            method_report(full_method_name, method, present_methods, missing_methods, misc_errors)

def traverse_modules(package_name):
    package = importlib.import_module(package_name)
    missing_methods = []
    missing_modules = []
    present_methods = []
    misc_errors = []
    for loader, name, is_pkg in pkgutil.walk_packages(package.__path__, package.__name__ + '.'):
        # Skip the skimage.data submodule entirely
        if ("skimage.data" in name or any([x.startswith('_') for x in name.split('.')])
                or "tests" in name.split('.')):
            continue

        if is_pkg:
            module_report(name, present_methods, missing_methods, missing_modules, misc_errors)
            continue

        full_method_name = name
        method_name = name.split(".")[-1]
        method_report(full_method_name, method_name, present_methods, missing_methods, misc_errors)
        # if not pkg --> method

    return missing_methods, missing_modules, present_methods, misc_errors

# Usage
missing_sk_methods, missing_sk_modules, present_sk_methods, misc_sk_errors = traverse_modules('skimage')
print("Missing Methods:")
print(missing_sk_methods)
print("Missing Modules:")
print(missing_sk_modules)

print("Present Methods:")
print(present_sk_methods)

print("Misc Errors:")
print(misc_sk_errors)