Booritas / slideio

BSD 3-Clause "New" or "Revised" License
49 stars 2 forks source link

Question: Best way to extract tiles from an image? #22

Closed mildewey closed 8 months ago

mildewey commented 9 months ago

I'm trying to use slideio to tile an NDPI image and save those tiles for a variety of uses. I'm getting much worse performance timings for slideio than openslide (I was hoping to switch away from openslide).

I'm currently doing a tight loop of read_block((isize, jsize, size, size), (size, size)), getting the numpy array, putting it in a pillow image object and saving that.

Weirdly, to me, the worst timings are coming when tiling at the 1:1 scale. I get really good performance when asking for 64:1 scale. The difference between a single, same-sized tile at 1:1 and 64:1 is an order of magnitude. The higher the scale, the better the performance.

I've tried at various tile sizes, and while smaller tile sizes are a little faster, larger tile sizes are read proportionally faster (this process for my current test ndpi took 30 minutes at 2048 pixel tiles and 68 minutes for the 1024 pixel tiles).

Any help or recommendations would be appreciated.

Booritas commented 9 months ago

The reading performance of tiles depends on the structure of the zoom pyramid. It is possible that, to read a small patch from a 1:1 scale, the software has to decode a large tile and extract a pixel rectangle corresponding to the patch size. I would anticipate optimal performance if you read patches with the size of the underlying TIFF tiles.

I will investigate performance issues and explore ways to enhance them. It would be beneficial if I could conduct these investigations on the files you are using. If possible, could you share the file with me?

An alternative approach could involve converting NDPI slides to SVS files. Slideio has the capability to perform such conversions, although it may not be the optimal solution.

I will keep you updated on the results of my investigations. Please let me know if sharing the file is possible.

Best regards, Stanislav

mildewey commented 9 months ago
import sys
import math
import time
from contextlib import contextmanager

from slideio import open_slide

@contextmanager
def perf_timer(name):
    start = time.perf_counter()
    try:
        yield
    finally:
        print(
            f"{name} took {round((time.perf_counter() - start) * 1000, 4)}ms",
        )

filename = sys.argv[1]
tile_size = int(sys.argv[2])
tile_dimensions = (tile_size, tile_size)
scaling_array = [1, 2, 4, 8, 16, 32, 64]

slide = open_slide(filename)
scene = slide.get_scene(0)

for scaling in scaling_array:
    print(f"{scaling=}")
    pixel_width = scene.size[0] / scaling
    pixel_height = scene.size[1] / scaling

    # limit so you don't have to wait all day to see things at higher scales
    tile_width = min(math.ceil(pixel_width / tile_size), 5)
    tile_height = min(math.ceil(pixel_height / tile_size), 5)
    for i in range(tile_width):
        for j in range(tile_height):
            rect = (
                i * tile_size * scaling,
                j * tile_size * scaling,
                tile_size * scaling,
                tile_size * scaling,
            )
            with perf_timer(f"read_block({rect}, {tile_dimensions})"):
                block = scene.read_block(
                    rect,
                    tile_dimensions,
                )
                # in my code I would then put the block into a pillow image, but that only takes about a milisecond

I wrote the above script to reproduce the issue. I was able to reproduce it with the CMU-1.ndpi file publicly available here: https://openslide.cs.cmu.edu/download/openslide-testdata/Hamamatsu/

Booritas commented 8 months ago

@mildewey Thank you for your feedback! I just uploaded a new version (2.3.0) with improved performance for the NDPI images. Sorry that it took so long. I had to do refactoring the driver and redo jpeg decoding procedure. Please let me know if it works for you. If you like slideio, please consider giving a star to the repository.

mildewey commented 8 months ago

YES! We updated and it's performing much better! The tile read times are very consistent now and the base tile-reading time has improved dramatically as well!