choosehappy / HistoQC

HistoQC is an open-source quality control tool for digital pathology slides
BSD 3-Clause Clear License
263 stars 105 forks source link

Numerical precision causes slow file load #220

Closed guillermojp closed 1 year ago

guillermojp commented 2 years ago

Hi,

I've just noticed that the BaseImage class is causing a problem due to openslide's inconsistencies when determining the optimal level for downsampling (see this issue from openslide). This is especially relevant when using some functions that require 2.5x magnification. In my case, this becomes clear when checking the "level_downsamples" of the slide:

image

All in all, this doesn't look too bad. Issues due to numerical precision when loading the magnifications on openslide's behalf. This, however, causes a huge impact on data loading and processing in BaseImage (line 136). Imagine I want to obtain a "2.5x" magnification. I would compute the downsampling factor, right?

image

Ah, seems like 16. is a magnification level that is part of the slide itself (first image), corresponding to level 2 (appears as magnification of 16.00087281228769). This contrasts with the result of openslide's "get_best_level_for_downsample":

image

In here it can be visualized better:

image

This seems silly but it causes the "relative_down" variable to never be exactly 1 for line 138:

image

And even if the "level" was selected correctly, it wouldn't either be valued exactly 1:

image

I've done a fix in the code I've been working with so that it is written as:

eps = np.finfo("float16").eps # Machine epsilon of float16, eps = 0.000977
possible_levels = np.abs(np.array(osh.level_downsamples) - down_factor) < eps

if np.any(possible_levels):
    level = np.where(possible_levels)[0][0]
else:
    level = osh.get_best_level_for_downsample(down_factor)

relative_down = down_factor / osh.level_downsamples[level]

if np.allclose(relative_down,1,atol=eps):  # there exists an open slide level exactly for this requested mag
    CONTINUE_CODE

The difference in loading times is dramatic: from 1.4 minutes when using self.getImgThumb("2.5x") to about 4-6 seconds.

Hope it helps G

choosehappy commented 2 years ago

You are absolutely right, thanks for pointing this out

How about something like this, which is even more generous in terms of tolerance for "clicking" to the closest available magnification/level


down_factor=3.99 # for testing, also 4.01
relative_down_factors_idx=[np.isclose(x/down_factor,1,atol=.01) for x in osh.level_downsamples]
level=np.where(relative_down_factors_idx)[0]

if level.size:
    level=level[0]
    output = osh.read_region((0, 0), level, osh.level_dimensions[level])
    output = np.asarray(output)[:, :, 0:3]
else:
    level = osh.get_best_level_for_downsample(down_factor)

...
guillermojp commented 2 years ago

That would be even more elegant than my solution, I'd agree!

choosehappy commented 1 year ago

Great, thanks for the feedback! would you be willing to put together a pull request? I'm mostly worried about having appropriate data to test it out at scale at the moment

On Mon, Sep 12, 2022 at 7:33 AM guillermojp @.***> wrote:

That would be even more elegant than my solution, I'd agree!

— Reply to this email directly, view it on GitHub https://github.com/choosehappy/HistoQC/issues/220#issuecomment-1243242741, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACJ3XTGZKFF6BJ6WCYFC63TV526D7ANCNFSM6AAAAAAQHZXAF4 . You are receiving this because you commented.Message ID: @.***>