choosehappy / HistoQC

HistoQC is an open-source quality control tool for digital pathology slides
BSD 3-Clause Clear License
253 stars 100 forks source link

Performance Degradation and Magnification Selection Issue in HistoQC Processing #297

Open SaharAlmahfouzNasser opened 1 month ago

SaharAlmahfouzNasser commented 1 month ago

Hello, I encountered an issue while utilizing HistoQC for processing large whole slide images, each consisting of 10 levels. The magnification data was absent from the metadata,so I used tifftools to add this information following this tutorial (https://andrewjanowczyk.com/converting-an-existing-image-into-an-openslide-compatible-format/). Despite successfully processing the images with this method, the processing time exceeded 8 hours per image, significantly longer than anticipated. It's possible that HistoQC isn't selecting the appropriate magnification level for processing. Could you kindly investigate this issue further?

choosehappy commented 1 month ago

can you send over the results from a level_dimensions call from openslide on one of the images?

CielAl commented 1 month ago

Alternatively, in the current version if the entire cohort shares the same magnification then you may specify it in config's BaseImage section, e.g., "base_mag: 20x", in case you need a quick workaround.

Regarding the reason of the degrading, it's hard to find it out without knowing the relevant meta data, especially the level_dimensions as @choosehappy pointed out. This is because, while HistoQC process the image under lower-resolutional thumbnails, to obtain such thumbnails requires to read the whole level of image which is closest to the target work size (and then downscale to the target size).

Therefore, if the closest level itself is extremely large, it will be extremely slow.

SaharAlmahfouzNasser commented 1 month ago

Hello, I tried to define the (base_mag:40x) as you suggested, however this did not speed up the process.

Kindly check if I modified the config file correctly. image

jacksonjacobs1 commented 1 month ago

@SaharAlmahfouzNasser sent me the WSI and I discovered the root of the problem: setting the image metadata using tifftools caused Openslide to alter the level_dimensions of the slide.

These were my steps:

  1. Check the original openslide properties of the image:

    osh.level_dimensions
    
    ((196096, 87552),
     (98048, 43776),
     (49024, 21888),
     (24512, 10944),
     (12256, 5472),
     (6128, 2736),
     (3064, 1368),
     (1532, 684),
     (766, 342),
     (383, 171))
    osh.level_downsamples
    
    (1.0, 2.0, 4.0, 8.0, 16.0, 32.0, 64.0, 128.0, 256.0, 512.0)
    osh.properties['openslide.mpp-x']
    
    '0.25'
    
    osh.properties['openslide.mpp-y']
    
    '0.25'
  2. Insert metadata assuming base mag = 40 and MPP = 0.25

    tifftools set -y -s ImageDescription  "Aperio Fake |AppMag = 40|MPP = 0.25" Downloads/17__20190808_143703.tiff
  3. Check the properties again

    osh.level_dimensions
    
    ((196096, 87552),
     (98304, 44032),
     (49152, 22016),
     (24576, 11264),
     (12288, 5632),
     (6144, 3072),
     (3072, 1536),
     (1536, 1024),
     (1024, 512),
     (512, 512))
    osh.level_downsamples
    
    (1.0,
     1.9915818798449614,
     3.9831637596899228,
     7.87594696969697,
     15.75189393939394,
     30.208333333333336,
     60.41666666666667,
     106.58333333333334,
     181.25,
     277.0)

I'm not sure why this is happening. @choosehappy thoughts?

CielAl commented 1 month ago
17__20190808_14

Hello, I tried to define the (base_mag:40x) as you suggested, however this did not speed up the process.

Kindly check if I modified the config file correctly. image

Hi,

what I meant was to use the base_mag setting and your original WSI missing mag in the header --- do you still observe the degradation.

If so then we can assume that the root cause might be irrelevant to the tifftools setttings but rather the level_dimension itself and how to obtain the downsamples in BaseImage.

@jacksonjacobs1 could you try the read_region at level=5 (size 6144, 3072) and see if it causes any trouble reading? What is the tile size of the slide and is there a bbox info in the slide (in which case BaseImage will use the resizedDownward function, reading individual windows, resizing and stitching them to form the thumbnail).

It seems to me that the tifftools output is unexpected behavior. What vendor information it is in the original file? What is the parsed output if you use TiffSlide's level_dimensions (here)[https://github.com/Bayer-Group/tiffslide/blob/c5673d0085310041e7a01d79a9ba8e7d84c01ede/tiffslide/tiffslide.py#L97] (both the original and tifftools processed output)?

CielAl commented 1 month ago

@jacksonjacobs1 @choosehappy

OK it looks like the degradation is caused due to reading a much larger level then intended due to the altered level_dimension.

For the original file, to obtain a 1.25x thumbnail at 40x it only needs to read level=5 directly and this should be fast.

But in the tifftools processed slide, BaseImage will attempt to use osh.get_best_level_for_downsample as the no close-enough relative downsample factors can be found. relative_down_factors_idx=[np.isclose(i/downsample_factor,1,atol=.01) for i in osh.level_downsamples] will be empty as all level_downsamples are deviated from the original values. isExactLevel will be set and False and the much slower resizeTileDownward will be used (here)

@SaharAlmahfouzNasser as I mentioned, for a very quick workaround please try to use the original slides and override base_mag in config.ini if you haven't tried that yet.

SaharAlmahfouzNasser commented 1 month ago

Using the original image will give the error shown in the screenshot below. image

jacksonjacobs1 commented 1 month ago

@CielAl the code you linked to is not called in Sahar's use case because enable_bounding_box is set to False by default, but the concept is the same.

The following line is used to get the closest image thumbnail by default: https://github.com/choosehappy/HistoQC/blob/37da20a8b5476289c15f33f2aec775d0aacdc249/histoqc/BaseImage.py#L215

could you try the read_region at level=5 (size 6144, 3072) and see if it causes any trouble reading

Yes, I confirmed that osh.get_thumbnail produces the undesirable behavior when level_dimensions have been altered.

In summary, this issue does not warrant any code changes in HistoQC. Instead I would recommend adding a disclaimer to the blog to verify that the added metadata does not unintentionally change the level_downsamples computed by openslide.

SaharAlmahfouzNasser commented 1 month ago

@CielAl with defining Base_mag = 40x and using the original image, it takes around 25 minutes, and the output looks better than the one I got when processing the modified image. So as @jacksonjacobs1 suggested it might be a problem related to tifftools.

CielAl commented 1 month ago

get_thumbnail

@jacksonjacobs1 Thanks for pointing it out. Then yes, based on your tests it seems like tifftools somehow corrupts the image.

@SaharAlmahfouzNasser In this case since it's the tifftools that introduces unexpected/undocumented behavior, I suggest you simply use the original image and override the base_mag in config.ini if you really needs some viable results asap.

I think we can close the issue given by Sahar's tests on base_mag overwriting in config and Jackson's observation.

jacksonjacobs1 commented 1 month ago

FYI I ran Sahar's use case with an identical environment (but on my local laptop).

Using Yufei's suggestion of manually setting base_mag the compute time was < 2 minutes.