Bayer-Group / tiffslide

TiffSlide - cloud native openslide-python replacement based on tifffile
Other
80 stars 12 forks source link

Tiffslide much slower than openslide reading patches from SVS with JPEG2000 compression #72

Closed kaczmarj closed 1 year ago

kaczmarj commented 1 year ago

hello, thanks for developing this fantastic package! i am working on porting one of my projects from openslide to tiffslide (very easy thanks to mirrored API :smile:). however i found that tiffslide is much slower than openslide when reading patches from an SVS file in The Cancer Genome Atlas (TCGA).

i created a jupyter notebook to benchmark this here https://gist.github.com/kaczmarj/41c351be6f52aa6a553cc12ba98a9103. this notebook runs a simple benchmarking function on a TCGA BRCA slide and a TIFF and SVS file from openslide test data.

using the slide TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs (from https://portal.gdc.cancer.gov/files/d46167af-6c29-49c7-95cf-3a801181aca4), i got the following results. tiffslide takes >10x longer to read patches than openslide.

i did not see the same behavior when evaluating CMU-1.tiff and CMU-1.svs from openslide test data, so i don't suspect disk caching to be the culprit.

Openslide -- get thumbnail
711 ms ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Tiffslide -- get thumbnail
2.27 s ± 38.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Openslide -- read region at level 0
1.89 ms ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Tiffslide -- read region at level 0
77.5 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Openslide -- read region at level 2
6.93 ms ± 250 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Tiffslide -- read region at level 2
73.5 ms ± 1.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
ap-- commented 1 year ago

Hi @kaczmarj

Happy to hear that tiffslide is useful to you!

Your benchmark is not testing a really useful scenario. When you run timeit with the same region, you hit openslide's and tiffslide's internal cache after the first call and in this scenario, you're effectively measuring (on the tiffslide side) how long PIL takes to convert a numpy array.

Benchmarking this stuff is not really simple, since you have to be aware of internal caches of your tools, and also of other non-obvious caches, like your operating system caching disk access, etc.

As mentioned in the readme, I recommend running the benchmark below, which tries to test accessing multiple different tiles on files, to simulate a more realistic use case.

OPENSLIDE_TESTDATA_DIR=/path/to/testdata/ python docs/generate_benchmark_plots.py

you can easily modify the files used to run the benchmark by changing: https://github.com/bayer-science-for-a-better-life/tiffslide/blob/63c86e9d4f168072bb75784e720d0d0acdacee0f/tiffslide/tests/test_benchmark.py#L15-L21

I'd be interested to see your results on the tcga files!

Cheers, Andreas :smiley:

kaczmarj commented 1 year ago

let me add the TCGA file to the benchmark and test. thanks for the quick reply @ap-- !

kaczmarj commented 1 year ago

hi @ap-- I added an SVS files from TCGA to the pytests and generated the plots. i am seeing a 4x in runtime for tiffslide vs openslide. it's interesting that this does not have for CMU-2.svs... do you have any thoughts on why this could be? i can test other SVS slides from TCGA as well if you think that would be useful.

my only hypothesis at this point is that this is related to the image size. the tcga svs is 1.6 gb whereas the CMU SVS is 542 mb.

in test_benchmark.py, i set the FILES dictionary to

FILES = {
    "svs": "Aperio/CMU-2.svs",
    "generic": "Generic-TIFF/CMU-1.tiff",
    "tcga-svs": "TCGA-SVS/TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs",
}

benchmark_read_tiles_as_numpy benchmark_read_tiles_as_pil

kaczmarj commented 1 year ago

i tested two different tcga slides of different sizes but it seems that openslide is much faster than tifffile for both of these images. my hypothesis of image size being related to the speed does not seem to be correct.

by the way, i am on a debian 12 linux system with python 3.10.12 and glibc version 2.36.

$ uname -a
Linux dash 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux

benchmark_read_tiles_as_numpy benchmark_read_tiles_as_pil

sdvillal commented 1 year ago

Is there a difference in the compression used by these files?

kaczmarj commented 1 year ago

yes there is a difference in compression. i used tiffinfo (from libtiff) to get this info. CMU-2.svs uses JPEG compression whereas the TCGA svs file uses compression scheme 33005 (which apparently is a specific type of JPEG 2000). OpenSlide has some notes about this compression scheme (from https://openslide.org/formats/aperio/):

JPEG 2000 (compression types 33003 or 33005)

Some Aperio files use compression type 33003 or 33005. Images using this compression need to be decoded as a JPEG 2000 codestream. For 33003: YCbCr format, possibly with a chroma subsampling of 4:2:2. For 33005: RGB format. Note that the TIFF file may not encode the colorspace or subsampling parameters in the PhotometricInterpretation field, nor the YCbCrSubsampling field, even though the TIFF standard seems to require this. The correct subsampling can be found in the JPEG 2000 codestream.

here are the tiff details for CMU-2.svs and TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs. please click on the arrows to expand the output.

tiffinfo TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs ``` === TIFF directory 0 === TIFF Directory at offset 0xaa0a9ec (178301420) Subfile Type: (0 = 0x0) Image Width: 48384 Image Length: 26880 Image Depth: 1 Tile Width: 256 Tile Length: 256 Bits/Sample: 8 Compression Scheme: 33005 (0x80ed) Photometric Interpretation: RGB color Samples/Pixel: 3 Planar Configuration: single image plane ImageDescription: Aperio Image Library v11.0.37 48384x26880 (256x256) J2K/KDU Q=70;Mirax Digital Slide|AppMag = 20|MPP = 0.23250 === TIFF directory 1 === TIFF Directory at offset 0xaa5e634 (178644532) Subfile Type: (0 = 0x0) Image Width: 1024 Image Length: 568 Image Depth: 1 Bits/Sample: 8 Compression Scheme: JPEG Photometric Interpretation: RGB color YCbCr Subsampling: 2, 2 Samples/Pixel: 3 Rows/Strip: 16 Planar Configuration: single image plane ImageDescription: Aperio Image Library v11.0.37 48384x26880 -> 1024x568 - ;Mirax Digital Slide|AppMag = 20|MPP = 0.23250 JPEG Tables: (289 bytes) === TIFF directory 2 === TIFF Directory at offset 0xb596216 (190407190) Subfile Type: (0 = 0x0) Image Width: 12096 Image Length: 6720 Image Depth: 1 Tile Width: 256 Tile Length: 256 Bits/Sample: 8 Compression Scheme: 33005 (0x80ed) Photometric Interpretation: RGB color Samples/Pixel: 3 Planar Configuration: single image plane ImageDescription: Aperio Image Library v11.0.37 48384x26880 (256x256) -> 12096x6720 J2K/KDU Q=70 === TIFF directory 3 === TIFF Directory at offset 0xb669062 (191271010) Subfile Type: (0 = 0x0) Image Width: 3024 Image Length: 1680 Image Depth: 1 Tile Width: 256 Tile Length: 256 Bits/Sample: 8 Compression Scheme: 33005 (0x80ed) Photometric Interpretation: RGB color Samples/Pixel: 3 Planar Configuration: single image plane ImageDescription: Aperio Image Library v11.0.37 48384x26880 (256x256) -> 3024x1680 J2K/KDU Q=70 ```
tiffinfo CMU-2.svs ``` === TIFF directory 0 === TIFF Directory at offset 0x14548b52 (341085010) Subfile Type: (0 = 0x0) Image Width: 78000 Image Length: 30462 Image Depth: 1 Tile Width: 256 Tile Length: 256 Bits/Sample: 8 Compression Scheme: JPEG Photometric Interpretation: RGB color YCbCr Subsampling: 2, 2 Samples/Pixel: 3 Planar Configuration: single image plane ImageDescription: Aperio Image Library v10.0.51 79560x30562 [0,100 78000x30462] (256x256) JPEG/RGB Q=30|AppMag = 20|StripeWidth = 2040|ScanScope ID = CPAPERIOCS|Filename = CMU-2|Date = 12/29/09|Time = 10:02:42|User = b414003d-95c6-48b0-9369-8010ed517ba7|Parmset = USM Filter|MPP = 0.4990|Left = 27.409658|Top = 20.522137|LineCameraSkew = -0.000424|LineAreaXOffset = 0.019265|LineAreaYOffset = -0.000313|Focus Offset = 0.000000|ImageID = 1004487|OriginalWidth = 79560|Originalheight = 30562|Filtered = 5|ICC Profile = ScanScope v1 ICC Profile: , 141992 bytes JPEG Tables: (289 bytes) === TIFF directory 1 === TIFF Directory at offset 0x145cfce2 (341638370) Subfile Type: (0 = 0x0) Image Width: 1024 Image Length: 399 Image Depth: 1 Bits/Sample: 8 Compression Scheme: JPEG Photometric Interpretation: RGB color YCbCr Subsampling: 2, 2 Samples/Pixel: 3 Rows/Strip: 16 Planar Configuration: single image plane ImageDescription: Aperio Image Library v10.0.51 78000x30462 -> 1024x399 - |AppMag = 20|StripeWidth = 2040|ScanScope ID = CPAPERIOCS|Filename = CMU-2|Date = 12/29/09|Time = 10:02:42|User = b414003d-95c6-48b0-9369-8010ed517ba7|Parmset = USM Filter|MPP = 0.4990|Left = 27.409658|Top = 20.522137|LineCameraSkew = -0.000424|LineAreaXOffset = 0.019265|LineAreaYOffset = -0.000313|Focus Offset = 0.000000|ImageID = 1004487|OriginalWidth = 79560|Originalheight = 30562|Filtered = 5|ICC Profile = ScanScope v1 JPEG Tables: (289 bytes) === TIFF directory 2 === TIFF Directory at offset 0x16f1c454 (384943188) Subfile Type: (0 = 0x0) Image Width: 19500 Image Length: 7615 Image Depth: 1 Tile Width: 256 Tile Length: 256 Bits/Sample: 8 Compression Scheme: JPEG Photometric Interpretation: RGB color YCbCr Subsampling: 2, 2 Samples/Pixel: 3 Planar Configuration: single image plane ImageDescription: Aperio Image Library v10.0.51 79560x30562 [0,100 78000x30462] (256x256) -> 19500x7615 JPEG/RGB Q=65 JPEG Tables: (289 bytes) === TIFF directory 3 === TIFF Directory at offset 0x172dfb2e (388889390) Subfile Type: (0 = 0x0) Image Width: 4875 Image Length: 1903 Image Depth: 1 Tile Width: 256 Tile Length: 256 Bits/Sample: 8 Compression Scheme: JPEG Photometric Interpretation: RGB color YCbCr Subsampling: 2, 2 Samples/Pixel: 3 Planar Configuration: single image plane ImageDescription: Aperio Image Library v10.0.51 79560x30562 [0,100 78000x30462] (256x256) -> 4875x1903 JPEG/RGB Q=82 JPEG Tables: (289 bytes) === TIFF directory 4 === TIFF Directory at offset 0x17431686 (390272646) Subfile Type: (0 = 0x0) Image Width: 2437 Image Length: 951 Image Depth: 1 Tile Width: 256 Tile Length: 256 Bits/Sample: 8 Compression Scheme: JPEG Photometric Interpretation: RGB color YCbCr Subsampling: 2, 2 Samples/Pixel: 3 Planar Configuration: single image plane ImageDescription: Aperio Image Library v10.0.51 79560x30562 [0,100 78000x30462] (256x256) -> 2437x951 JPEG/RGB Q=91 JPEG Tables: (289 bytes) === TIFF directory 5 === TIFF Directory at offset 0x1748db6c (390650732) Subfile Type: reduced-resolution image (1 = 0x1) Image Width: 387 Image Length: 463 Image Depth: 1 Bits/Sample: 8 Compression Scheme: LZW Photometric Interpretation: RGB color Samples/Pixel: 3 Rows/Strip: 7 Planar Configuration: single image plane ImageDescription: Aperio Image Library v10.0.51 label 387x463 Predictor: horizontal differencing 2 (0x2) === TIFF directory 6 === TIFF Directory at offset 0x174a5ec4 (390749892) Subfile Type: reduced-resolution image (9 = 0x9) Image Width: 1280 Image Length: 431 Image Depth: 1 Bits/Sample: 8 Compression Scheme: JPEG Photometric Interpretation: RGB color YCbCr Subsampling: 2, 2 Samples/Pixel: 3 Rows/Strip: 16 Planar Configuration: single image plane ImageDescription: Aperio Image Library v10.0.51 macro 1280x431 JPEG Tables: (289 bytes) ```

in the TCGA SVS, TIFF directory 1 uses JPEG compression. perhaps by forcing a read from directory 1 we can test whether difference in compression is the culprit. if we read from directory 1 and tiffslide is still slower than openslide, there could be something in addition to compression differences. but if the speed matches/exceeds openslide, then the compression is the cause.

but directory 1 of the TCGA SVS only has size 1024x568 WxH. perhaps that's the thumbnail. it does not come up as an image level in openslide or tiffslide.

ap-- commented 1 year ago

Hmm, my tests indicate both images seem to store uncompressed tiles...

# pip install pado
# pip install aiohttp requests s3fs

import json
from pprint import pprint

from pado.images.ids import ImageId
from pado.images.providers import ImageProvider
from pado.io.files import urlpathlike_to_fsspec
from tiffslide import TiffSlide

import matplotlib.pyplot as plt

ip = ImageProvider.from_parquet(
    "zip:///tcga.image.parquet::https://github.com/ap--/pado-tcga/releases/download/v0.0.1/pado-tcga-dataset.zip"
)

image_ids = [
    ImageId(
        '2aa283f3-732c-4879-8d37-1fec3ccf5bdc',
        'TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs',
        site='tcga',
    ),
    ImageId(
        'd46167af-6c29-49c7-95cf-3a801181aca4',
        'TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs',
        site='tcga',
    ),
]

for iid in image_ids:

    img = ip[iid]
    of = urlpathlike_to_fsspec(img.urlpath)

    # check via tiffslide
    ts = TiffSlide(of)
    print(iid)
    pprint(json.loads(ts.zarr_group.store["0/.zarray"]))

    fig = plt.figure()
    w, h = ts.dimensions
    plt.imshow(ts.read_region((w//2, h//2), 0, (1000, 1000), as_array=True))

plt.show()

output:

ImageId('2aa283f3-732c-4879-8d37-1fec3ccf5bdc', 'TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs', site='tcga')
{'chunks': [256, 256, 3],
 'compressor': None,
 'dtype': '|u1',
 'fill_value': 0,
 'filters': None,
 'order': 'C',
 'shape': [26880, 48384, 3],
 'zarr_format': 2}
ImageId('d46167af-6c29-49c7-95cf-3a801181aca4', 'TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs', site='tcga')
{'chunks': [256, 256, 3],
 'compressor': None,
 'dtype': '|u1',
 'fill_value': 0,
 'filters': None,
 'order': 'C',
 'shape': [74432, 101184, 3],
 'zarr_format': 2}

f1 f2

If that turns out to be true, it would mean that there's just too much python overhead in reading uncompressed tiles from disk via zarr. We'd need some profiling to be sure about that and a potential solution would be to try if we can just shortcut for local uncompressed files. I have a test implementation of a memory mapped zarr store for local files lying around somewhere. I'll try to find it. Will report back in the coming days.

Cheers, Andreas :smiley:

kaczmarj commented 1 year ago

thanks @ap-- that's very helpful. i also see that tiffslide reports no compression:

code:

import json, tiffslide
tslide = tiffslide.TiffSlide("TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs")
json.loads(tslide.zarr_group.store["0/.zarray"])

output:

{'chunks': [256, 256, 3],
 'compressor': None,
 'dtype': '|u1',
 'fill_value': 0,
 'filters': None,
 'order': 'C',
 'shape': [26880, 48384, 3],
 'zarr_format': 2}

but exiftool also shows that JPEG2000 compression is used.

$ git clone https://github.com/exiftool/exiftool.git
$ cd exiftool
$ ./exiftool ../TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs
ExifTool Version Number         : 12.64
File Name                       : TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs
Directory                       : ..
File Size                       : 191 MB
File Modification Date/Time     : 2023:07:08 09:34:00-04:00
File Access Date/Time           : 2023:07:08 09:37:25-04:00
File Inode Change Date/Time     : 2023:07:08 09:37:08-04:00
File Permissions                : -rw-r--r--
File Type                       : TIFF
File Type Extension             : tif
MIME Type                       : image/tiff
Exif Byte Order                 : Little-endian (Intel, II)
Image Width                     : 48384
Image Height                    : 26880
Bits Per Sample                 : 8 8 8
Compression                     : Aperio JPEG 2000 RGB
Photometric Interpretation      : RGB
Image Description               : Aperio Image Library v11.0.37..48384x26880 (256x256) J2K/KDU Q=70;Mirax Digital Slide|AppMag = 20|MPP = 0.23250
Samples Per Pixel               : 3
Planar Configuration            : Chunky
Strip Offsets                   : (Binary data 359 bytes, use -b option to extract)
Rows Per Strip                  : 16
Strip Byte Counts               : (Binary data 173 bytes, use -b option to extract)
JPEG Tables                     : (Binary data 289 bytes, use -b option to extract)
Y Cb Cr Sub Sampling            : YCbCr4:2:0 (2 2)
Subfile Type                    : Full-resolution image
Tile Width                      : 256
Tile Length                     : 256
Tile Offsets                    : (Binary data 839 bytes, use -b option to extract)
Tile Byte Counts                : (Binary data 464 bytes, use -b option to extract)
Image Depth                     : 1
Page Count                      : 4
Image Size                      : 48384x26880
Megapixels                      : 1300.6
cgohlke commented 1 year ago

i also see that tiffslide reports no compression

That's because tifffile.ZarrTiffStore is just a thin wrapper around a tifffile.TiffFile instance. The store transparently handles all the file access, decompression, predictors, unpacking, padding, etc. Zarr/numcodecs would not be able to handle all the cases found in TIFF.

it seems that openslide is much faster than tifffile

On my aging Windows system, the difference is much less:

benchmark_read_tiles_as_numpy

I suspect the difference could be due to differences in JPEG2000 decoders. For example, imagecodecs does not enable OpenJPEG multi-threading by default. I'll check if that's significant...

I am surprised that tiffslide/tifffile/zarr perform competitively. There are many, many layers of pure Python code...

cgohlke commented 1 year ago

imagecodecs does not enable OpenJPEG multi-threading by default. I'll check if that's significant...

It turns out that enabling multi-threading makes things significantly worse :(

sdvillal commented 1 year ago

Maybe some basic profiling could help discerning if the time spent on other things is dominant or if this is really a case of differences between how imagecodecs and openslide wrap around OpenJPEG to decode JP2K.

kaczmarj commented 1 year ago

I am surprised that tiffslide/tifffile/zarr perform competitively. There are many, many layers of pure Python code...

i am also surprised and impressed that this implementation performs competitively!

i realize my words might have unintentionally come across as negative or offensive towards tiffslide/tifffile and i want to be clear that i do not imply any negativity here. i hold tremendous respect for tiffslide and tifffile (and all of your work @cgohlke !).

It turns out that enabling multi-threading makes things significantly worse :(

that is unfortunate 😢

Maybe some basic profiling could help

i ran python's cProfile on the read_region method in tiffslide and openslide. this doesn't capture the C bits in openslide unfortunately (and i don't know how to do that). when profiling TiffSlide.read_region, most of the time was spent in the function imagecodecs._jpeg2k.jpeg2k_decode. the results are below. i truncated the profiling results of tiffslide profiling to ~25 function calls. i also replaced the path to my python installation to 'path/to' to make the lines shorter.

Tiffslide

code:

import cProfile, pstats, tiffslide
tslide = tiffslide.TiffSlide("TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs")
with cProfile.Profile() as pr:
    tslide.read_region(location=(14_000, 12_000), level=0, size=(512, 512))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats()

output:

         6732 function calls (6304 primitive calls) in 0.044 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       18    0.039    0.002    0.039    0.002 {imagecodecs._jpeg2k.jpeg2k_decode}
   152/12    0.001    0.000    0.001    0.000 {built-in method _abc._abc_subclasscheck}
        4    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:11745(__init__)
        2    0.000    0.000    0.000    0.000 {built-in method _imp.create_dynamic}
      2/1    0.000    0.000    0.001    0.001 {built-in method _imp.exec_dynamic}
       18    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:12944(_indices)
        9    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/zarr/core.py:1862(_process_chunk)
        1    0.000    0.000    0.000    0.000 {built-in method PIL._imaging.fill}
        3    0.000    0.000    0.001    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:7770(__init__)
       43    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:10631(fromfile)
       19    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:12897(_parse_key)
       18    0.000    0.000    0.040    0.002 path/to/python3.11/site-packages/tifffile/tifffile.py:12836(_getitem)
       45    0.000    0.000    0.000    0.000 {method 'read' of '_io.BufferedReader' objects}
  301/230    0.000    0.000    0.000    0.000 path/to/python3.11/json/encoder.py:334(_iterencode_dict)
     24/6    0.000    0.000    0.004    0.001 path/to/python3.11/functools.py:981(__get__)
        1    0.000    0.000    0.000    0.000 {method 'decode' of 'ImagingDecoder' objects}
        1    0.000    0.000    0.040    0.040 path/to/python3.11/site-packages/zarr/core.py:1257(_get_selection)
      748    0.000    0.000    0.001    0.000 {built-in method builtins.isinstance}
      149    0.000    0.000    0.000    0.000 {built-in method _struct.unpack}
        8    0.000    0.000    0.000    0.000 path/to/python3.11/enum.py:241(__set_name__)
       43    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/tifffile/tifffile.py:10793(_process_value)
       18    0.000    0.000    0.039    0.002 path/to/python3.11/site-packages/tifffile/tifffile.py:8574(decode_image)
        1    0.000    0.000    0.001    0.001 path/to/python3.11/site-packages/tifffile/tifffile.py:12332(__init__)
  135/127    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        1    0.000    0.000    0.001    0.001 path/to/python3.11/site-packages/tifffile/tifffile.py:7288(_load)
      230    0.000    0.000    0.000    0.000 path/to/python3.11/json/encoder.py:414(_iterencode)
        5    0.000    0.000    0.000    0.000 path/to/python3.11/typing.py:1896(_get_protocol_attrs)

[truncated]

Openslide

code:

import cProfile, pstats, openslide
oslide = openslide.OpenSlide("TCGA-05-4395-01Z-00-DX1.20205276-ca16-46b2-914a-fe5e576a5cf9.svs")
with cProfile.Profile() as pr:
    oslide.read_region(location=(14_000, 12_000), level=0, size=(512, 512))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats()

output:

         30 function calls in 0.026 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.026    0.026    0.026    0.026 path/to/python3.11/site-packages/openslide/lowlevel.py:300(read_region)
        1    0.000    0.000    0.000    0.000 {built-in method openslide._convert.argb2rgba}
        1    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/openslide/lowlevel.py:186(_load_image)
        1    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/openslide/lowlevel.py:222(_check_error)
        1    0.000    0.000    0.000    0.000 {built-in method PIL._imaging.fill}
        2    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/PIL/Image.py:505(_new)
        1    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/PIL/Image.py:2955(frombuffer)
        1    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/PIL/Image.py:2878(new)
        2    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/PIL/Image.py:2857(_check_size)
        2    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/openslide/lowlevel.py:129(from_param)
        1    0.000    0.000    0.000    0.000 {built-in method PIL._imaging.map_buffer}
        3    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/PIL/Image.py:481(__init__)
        1    0.000    0.000    0.026    0.026 path/to/python3.11/site-packages/openslide/__init__.py:226(read_region)
        1    0.000    0.000    0.000    0.000 path/to/python3.11/cProfile.py:118(__exit__)
        2    0.000    0.000    0.000    0.000 path/to/python3.11/site-packages/openslide/lowlevel.py:214(_check_string)
        3    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        3    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'copy' of 'dict' objects}
cgohlke commented 1 year ago

i realize my words might have unintentionally come across as negative or offensive towards tiffslide/tifffile

Oh no. I did not understand it like that. I am interested in learning about such issues.

most of the time was spent in the function imagecodecs._jpeg2k.jpeg2k_decode

That's good to know. The tiles are relatively small (256x256) for JPEG 2000. Compared to an implementation in all C, such as oopenslide, for decoding a single tile there might be overheads from 1. calling the C function from Python, 2. creating a new instance of the OpenJPEG decoder in every call, 3. releasing the GIL, and 4. creating and copying image data into a numpy array. I'll try to enable Cython profiling https://cython.readthedocs.io/en/latest/src/tutorial/profiling_tutorial.html and see...

cgohlke commented 1 year ago

there might be overheads from 1. calling the C function from Python, 2. creating a new instance of the OpenJPEG decoder in every call, 3. releasing the GIL, and 4. creating and copying image data into a numpy array.

None of these seem significant in this case. Almost all the time is spent in OpenJPEG's opj_decode function. I rebuilt OpenJPEG with AVX2 extensions, but that made no difference on my system either :(

kaczmarj commented 1 year ago

imagecodecs._jpeg2k.jpeg2k_decode is run twice as many times as openslide's jp2k decoder and this could potentially explain the longer runtime.

when i profiled tiffslide.TiffSlide.read_region, i noticed that imagecodecs._jpeg2k.jpeg2k_decode was being called multiple times. this makes sense as it's decoding multiple tiles. i sought to measure the number of times openslide's jpeg2k decoder was run. to do this, i cloned openslide and added a print statement to line 59 of openslide-decode-jp2k.c. it seems that that function is run wither every call to the openjpeg decoder.

git clone https://github.com/openslide/openslide
git checkout v3.4.1
# add print statement to line 59
sed -i '59i   printf("Running unpack_argb\\n");' src/openslide-decode-jp2k.c
# build openslide
autoreconf -i
./configure
make

after building openslide, i copied the resulting library libopenslide.so.0.4.1 into my conda environment containing tiffslide and openslide (replacing the original openslide downloaded from conda-forge).

import openslide
oslide = openslide.OpenSlide("TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs")
oslide.read_region((0, 0), 0, (128, 128))
# prints:
# Running unpack_argb

oslide.read_region((14_000, 12_000), 0, (512, 512))
# prints:
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb
# Running unpack_argb

interestingly, if i profile TiffSlide.read_region to count the number of times imagecodecs._jpeg2k.jpeg2k_decode is called, then it is 2 in the first case and 18 in the second case. openslide called openjpeg decoder 1 time and 9 times for the same regions.

import cProfile, pstats, tiffslide

tslide = tiffslide.TiffSlide("TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs")

with cProfile.Profile() as pr:
    tslide.read_region(location=(0, 0), level=0, size=(128, 128))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats()

#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         2    0.005    0.002    0.005    0.002 {imagecodecs._jpeg2k.jpeg2k_decode}
# [truncated]

with cProfile.Profile() as pr:
    tslide.read_region(location=(14_000, 12_000), level=0, size=(512, 512))
stats = pstats.Stats(pr).sort_stats("tottime")
stats.print_stats()

#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#        18    0.041    0.002    0.041    0.002 {imagecodecs._jpeg2k.jpeg2k_decode}
# [truncated]
cgohlke commented 1 year ago

Good catch. This code requests the following keys from the Zarr store:

from tifffile import imread

im = imread(
    'TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs',
    selection=(slice(14_000, 14_512), slice(12_000, 12_512)),
)
0/54.46.0
0/54.46.0
0/54.47.0
0/54.47.0
0/54.48.0
0/54.48.0
0/55.46.0
0/55.46.0
0/55.47.0
0/55.47.0
0/55.48.0
0/55.48.0
0/56.46.0
0/56.46.0
0/56.47.0
0/56.47.0
0/56.48.0
0/56.48.0
cgohlke commented 1 year ago

The issue is that Zarr's KVStore, which is used to wrap ZarrTiffStore, does not have a __contains__ method such that key in store is routed through __getitem__, which triggers decoding...

I think it's a bug in Zarr that is easy to fix.

With the fix I get this:

benchmark_read_tiles_as_numpy

kaczmarj commented 1 year ago

wow that's fantastic! thanks @cgohlke. should i open an issue in the zarr-python github repo?

cgohlke commented 1 year ago

should i open an issue in the zarr-python github repo?

I'm on it.

ap-- commented 1 year ago

Ha! That's great :smiley: I guess I'll have to update the benchmarks in the readme once a new version of zarr is released :smile:

Thank's everyone!

cgohlke commented 1 year ago

i am seeing a 4x in runtime for tiffslide vs openslide

The Zarr issue accounts for a ~2x difference. Where does the other 2x come from? I don't see that on Windows, where the OS cache is not reset. Could also be a difference in how OpenJPEG is compiled. What versions of tifffile and imagecodecs were used and how were they installed?

kaczmarj commented 1 year ago

Where does the other 2x come from?

i was probably mistaken earlier when i said 4x, though i still do see that tiffslide is a bit slower than openslide when installed via pip. when installed via conda, tiffslide is faster!

What versions of tifffile and imagecodecs were used and how were they installed?

i tested installations via pip and via conda/mamba. i include the versions of the packages in each environment below (click on the arrows to show the versions). in both cases, tifffile==2023.7.4 but in the pip environment, imagecodecs==2023.7.4 whereas in conda imagecodecs==2023.1.23 is used (i could not install a newer version). i will re-run this using the same versions in all environments and will update.

i patched zarr.KVStore in tiffslide's __init__.py file as follows:

from zarr.storage import KVStore

def _zarr_kvstore___contains__(self, key):
    return key in self._mutable_mapping

KVStore.__contains__ = _zarr_kvstore___contains__

i also used test data from openslide test data and TCGA:

images/
├── Aperio
│   └── CMU-2.svs
└── TCGA-SVS
    └── TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.svs

pip install

code:

sudo apt install libopenslide0  # installs libopenslide0/stable,now 3.4.1+dfsg-6+b1 amd64
git clone https://github.com/bayer-science-for-a-better-life/tiffslide
cd tiffslide
~/mambaforge/bin/python3.10 -m venv venv
source ./venv/bin/activate
python -m pip install -U pip setuptools wheel
python -m pip install -e .[dev] matplotlib pandas openslide-python pytest-benchmark
OPENSLIDE_TESTDATA_DIR=images/ python docs/generate_benchmark_plots.py

on my debian bookworm machine, libopenslide is linked to libopenjp2.so.7 (pulled as a dependency from https://packages.debian.org/bookworm/libopenjp2-7).

Output of pip list ``` Package Version Editable project location ----------------- ------------------------------ ------------------------- asciitree 0.3.3 black 23.3.0 cfgv 3.3.1 click 8.1.4 contourpy 1.1.0 coverage 7.2.7 cycler 0.11.0 distlib 0.3.6 entrypoints 0.4 exceptiongroup 1.1.2 fasteners 0.18 filelock 3.12.2 fonttools 4.40.0 fsspec 2023.6.0 identify 2.5.24 imagecodecs 2023.7.4 iniconfig 2.0.0 kiwisolver 1.4.4 matplotlib 3.7.2 mypy 1.4.1 mypy-extensions 1.0.0 nodeenv 1.8.0 numcodecs 0.11.0 numpy 1.25.1 openslide-python 1.2.0 packaging 23.1 pandas 2.0.3 pathspec 0.11.1 Pillow 10.0.0 pip 23.1.2 platformdirs 3.8.1 pluggy 1.2.0 pre-commit 3.3.3 py-cpuinfo 9.0.0 pyparsing 3.0.9 pytest 7.4.0 pytest-benchmark 4.0.0 pytest-cov 4.1.0 python-dateutil 2.8.2 pytz 2023.3 PyYAML 6.0 setuptools 68.0.0 six 1.16.0 tifffile 2023.7.4 tiffslide 2.1.2.post0+g63c86e9.d20230710 /tmp/build/tiffslide tomli 2.0.1 typing_extensions 4.7.1 tzdata 2023.3 virtualenv 20.23.1 wheel 0.40.0 zarr 2.15.0 ```

results:

benchmark_read_tiles_as_numpy

conda (mamba) install

code:

git clone https://github.com/bayer-science-for-a-better-life/tiffslide
cd tiffslide
mamba env create -f environment.devenv.yml  # from tiffslide's repo
mamba activate tiffslide
mamba install openslide openslide-python matplotlib pandas
OPENSLIDE_TESTDATA_DIR=images/ python docs/generate_benchmark_plots.py
Output of mamba list ``` # packages in environment at /home/jakubk/mambaforge/envs/tiffslide: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge alsa-lib 1.2.9 hd590300_0 conda-forge aom 3.5.0 h27087fc_0 conda-forge asciitree 0.3.3 py_2 conda-forge attr 2.5.1 h166bdaf_1 conda-forge black 23.3.0 py311h38be061_1 conda-forge blosc 1.21.4 h0f2a231_0 conda-forge brotli 1.0.9 h166bdaf_9 conda-forge brotli-bin 1.0.9 h166bdaf_9 conda-forge brunsli 0.1 h9c3ff4c_0 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.19.1 hd590300_0 conda-forge c-blosc2 2.10.0 hb4ffafa_0 conda-forge ca-certificates 2023.5.7 hbcca054_0 conda-forge cairo 1.16.0 hbbf8b49_1016 conda-forge certifi 2023.5.7 pyhd8ed1ab_0 conda-forge cffi 1.15.1 py311h409f033_3 conda-forge cfgv 3.3.1 pyhd8ed1ab_0 conda-forge cfitsio 4.2.0 hd9d235c_0 conda-forge charls 2.4.2 h59595ed_0 conda-forge click 8.1.4 unix_pyh707e725_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge contourpy 1.1.0 py311h9547e67_0 conda-forge coverage 7.2.7 py311h459d7ec_0 conda-forge cycler 0.11.0 pyhd8ed1ab_0 conda-forge dav1d 1.2.1 hd590300_0 conda-forge dbus 1.13.6 h5008d03_3 conda-forge distlib 0.3.6 pyhd8ed1ab_0 conda-forge entrypoints 0.4 pyhd8ed1ab_0 conda-forge exceptiongroup 1.1.2 pyhd8ed1ab_0 conda-forge expat 2.5.0 hcb278e6_1 conda-forge fasteners 0.17.3 pyhd8ed1ab_0 conda-forge filelock 3.12.2 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.14.2 h14ed4e7_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.40.0 py311h459d7ec_0 conda-forge freetype 2.12.1 hca18f0e_1 conda-forge fsspec 2023.6.0 pyh1a96a4e_0 conda-forge gdk-pixbuf 2.42.10 h6b639ba_2 conda-forge gettext 0.21.1 h27087fc_0 conda-forge giflib 5.2.1 h0b41bf4_3 conda-forge glib 2.76.4 hfc55251_0 conda-forge glib-tools 2.76.4 hfc55251_0 conda-forge graphite2 1.3.13 h58526e2_1001 conda-forge gst-plugins-base 1.22.4 hf7dbed1_1 conda-forge gstreamer 1.22.4 h98fc4e7_1 conda-forge harfbuzz 7.3.0 hdb3a94d_0 conda-forge icu 72.1 hcb278e6_0 conda-forge identify 2.5.24 pyhd8ed1ab_0 conda-forge imagecodecs 2023.1.23 py311hd374d05_2 conda-forge iniconfig 2.0.0 pyhd8ed1ab_0 conda-forge jxrlib 1.1 h7f98852_2 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.4 py311h4dd048b_1 conda-forge krb5 1.20.1 h81ceb04_0 conda-forge lame 3.100 h166bdaf_1003 conda-forge lcms2 2.15 haa2dc70_1 conda-forge ld_impl_linux-64 2.40 h41732ed_0 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libaec 1.0.6 hcb278e6_1 conda-forge libavif 0.11.1 h8182462_2 conda-forge libblas 3.9.0 17_linux64_openblas conda-forge libbrotlicommon 1.0.9 h166bdaf_9 conda-forge libbrotlidec 1.0.9 h166bdaf_9 conda-forge libbrotlienc 1.0.9 h166bdaf_9 conda-forge libcap 2.67 he9d0100_0 conda-forge libcblas 3.9.0 17_linux64_openblas conda-forge libclang 15.0.7 default_h7634d5b_2 conda-forge libclang13 15.0.7 default_h9986a30_2 conda-forge libcups 2.3.3 h36d4200_3 conda-forge libcurl 8.1.2 h409715c_0 conda-forge libdeflate 1.18 h0b41bf4_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.12 hf998b51_1 conda-forge libexpat 2.5.0 hcb278e6_1 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libflac 1.4.3 h59595ed_0 conda-forge libgcc-ng 13.1.0 he5830b7_0 conda-forge libgcrypt 1.10.1 h166bdaf_0 conda-forge libgfortran-ng 13.1.0 h69a702a_0 conda-forge libgfortran5 13.1.0 h15d22d2_0 conda-forge libglib 2.76.4 hebfc3b9_0 conda-forge libgomp 13.1.0 he5830b7_0 conda-forge libgpg-error 1.47 h71f35ed_0 conda-forge libiconv 1.17 h166bdaf_0 conda-forge libjpeg-turbo 2.1.5.1 h0b41bf4_0 conda-forge liblapack 3.9.0 17_linux64_openblas conda-forge libllvm15 15.0.7 h5cf9203_2 conda-forge libnghttp2 1.52.0 h61bc06f_0 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libogg 1.3.4 h7f98852_1 conda-forge libopenblas 0.3.23 pthreads_h80387f5_0 conda-forge libopus 1.3.1 h7f98852_1 conda-forge libpng 1.6.39 h753d276_0 conda-forge libpq 15.3 hbcd7760_1 conda-forge libsndfile 1.2.0 hb75c966_0 conda-forge libsqlite 3.42.0 h2797004_0 conda-forge libssh2 1.11.0 h0841786_0 conda-forge libstdcxx-ng 13.1.0 hfd8a6a1_0 conda-forge libsystemd0 253 h8c4010b_1 conda-forge libtiff 4.5.1 h8b53f26_0 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libvorbis 1.3.7 h9c3ff4c_0 conda-forge libwebp-base 1.3.1 hd590300_0 conda-forge libxcb 1.15 h0b41bf4_0 conda-forge libxkbcommon 1.5.0 h5d7e998_3 conda-forge libxml2 2.11.4 h0d562d8_0 conda-forge libzlib 1.2.13 hd590300_5 conda-forge libzopfli 1.0.3 h9c3ff4c_0 conda-forge lz4-c 1.9.4 hcb278e6_0 conda-forge matplotlib 3.7.2 py311h38be061_0 conda-forge matplotlib-base 3.7.2 py311h54ef318_0 conda-forge mpg123 1.31.3 hcb278e6_0 conda-forge msgpack-python 1.0.5 py311ha3edf6b_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge mypy 1.4.1 py311h459d7ec_0 conda-forge mypy_extensions 1.0.0 pyha770c72_0 conda-forge mysql-common 8.0.33 hf1915f5_1 conda-forge mysql-libs 8.0.33 hca2cd23_1 conda-forge ncurses 6.4 hcb278e6_0 conda-forge nodeenv 1.8.0 pyhd8ed1ab_0 conda-forge nspr 4.35 h27087fc_0 conda-forge nss 3.89 he45b914_0 conda-forge numcodecs 0.11.0 py311hcafe171_1 conda-forge numpy 1.25.1 py311h64a7726_0 conda-forge openjpeg 2.5.0 hfec8fc6_2 conda-forge openslide 3.4.1 ha896ae7_9 conda-forge openslide-python 1.2.0 py311hd4cff14_2 conda-forge openssl 3.1.1 hd590300_1 conda-forge packaging 23.1 pyhd8ed1ab_0 conda-forge pandas 2.0.3 py311h320fe9a_1 conda-forge pathspec 0.11.1 pyhd8ed1ab_0 conda-forge pcre2 10.40 hc3806b6_0 conda-forge pillow 10.0.0 py311h0b84326_0 conda-forge pip 23.1.2 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge platformdirs 3.8.1 pyhd8ed1ab_0 conda-forge pluggy 1.2.0 pyhd8ed1ab_0 conda-forge ply 3.11 py_1 conda-forge pre-commit 3.3.3 pyha770c72_0 conda-forge psutil 5.9.5 py311h2582759_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge pulseaudio-client 16.1 hb77b528_4 conda-forge py-cpuinfo 9.0.0 pyhd8ed1ab_0 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pyparsing 3.0.9 pyhd8ed1ab_0 conda-forge pyqt 5.15.7 py311ha74522f_3 conda-forge pyqt5-sip 12.11.0 py311hcafe171_3 conda-forge pytest 7.4.0 pyhd8ed1ab_0 conda-forge pytest-benchmark 4.0.0 pyhd8ed1ab_0 conda-forge pytest-cov 4.1.0 pyhd8ed1ab_0 conda-forge python 3.11.4 hab00c5b_0_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-tzdata 2023.3 pyhd8ed1ab_0 conda-forge python_abi 3.11 3_cp311 conda-forge pytz 2023.3 pyhd8ed1ab_0 conda-forge pyyaml 6.0 py311hd4cff14_5 conda-forge qt-main 5.15.8 hf9e2b05_14 conda-forge readline 8.2 h8228510_1 conda-forge setuptools 68.0.0 pyhd8ed1ab_0 conda-forge sip 6.7.9 py311hb755f60_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.1.10 h9fff704_0 conda-forge tifffile 2023.7.4 pyhd8ed1ab_0 conda-forge tiffslide 2.1.2.post0+g63c86e9.d20230710 pypi_0 pypi tk 8.6.12 h27826a3_0 conda-forge toml 0.10.2 pyhd8ed1ab_0 conda-forge tomli 2.0.1 pyhd8ed1ab_0 conda-forge tornado 6.3.2 py311h459d7ec_0 conda-forge typing-extensions 4.7.1 hd8ed1ab_0 conda-forge typing_extensions 4.7.1 pyha770c72_0 conda-forge tzdata 2023c h71feb2d_0 conda-forge ukkonen 1.0.1 py311h4dd048b_3 conda-forge virtualenv 20.23.1 pyhd8ed1ab_0 conda-forge wheel 0.40.0 pyhd8ed1ab_0 conda-forge xcb-util 0.4.0 hd590300_1 conda-forge xcb-util-image 0.4.0 h8ee46fc_1 conda-forge xcb-util-keysyms 0.4.0 h8ee46fc_1 conda-forge xcb-util-renderutil 0.3.9 hd590300_1 conda-forge xcb-util-wm 0.4.1 h8ee46fc_1 conda-forge xkeyboard-config 2.39 hd590300_0 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.1.1 hd590300_0 conda-forge xorg-libsm 1.2.4 h7391055_0 conda-forge xorg-libx11 1.8.6 h8ee46fc_0 conda-forge xorg-libxau 1.0.11 hd590300_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h0b41bf4_2 conda-forge xorg-libxrender 0.9.11 hd590300_0 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge xorg-xf86vidmodeproto 2.3.1 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge zarr 2.15.0 pyhd8ed1ab_0 conda-forge zfp 1.0.0 h27087fc_3 conda-forge zlib 1.2.13 hd590300_5 conda-forge zlib-ng 2.0.7 h0b41bf4_0 conda-forge zstd 1.5.2 hfc55251_7 conda-forge ```

results:

benchmark_read_tiles_as_numpy

kaczmarj commented 1 year ago

the difference comes down to imagecodecs from conda-forge and imagecodecs from pypi. using the one from pypi, tiffslide is slower then openslide on the TCGA SVS file i am debugging with.

in my previous test, the conda/mamba environment had the best speeds for tiffslide. in that conda envirnoment, i pip installed imagecodecs==2023.1.23 and then tiffslide became almost 2x slower (~2.5 ms to ~4.9 ms).

mamba/conda environment with imagecodecs from conda-forge

benchmark_read_tiles_as_numpy

mamba/conda environment with imagecodecs from pypi

benchmark_read_tiles_as_numpy

kaczmarj commented 1 year ago

aha! the culprit is the different libopenjp2.so.2.5.0 that is pulled in when using pip and conda. to test this, i first installed all tiffslide dependencies with mamba/conda (with imagecodecs==2023.1.23). using that, tiffslide was faster than openslide for tcga-svs. then i pip installed imagecodecs==2023.1.23, and tiffslide became slower than openslide for tcga-svs. finally, i copied the file libopenjp2.so.2.5.0 that was downloaded from conda-forge into the directory

~/mambaforge/envs/tiffslide/lib/python3.11/site-packages/imagecodecs.libs/

and i essentially overwrote the previous version which was named libopenjp2-fc287c52.so.2.5.0. using the openjpeg from conda-forge, tiffslide was faster than openslide.

benchmark_read_tiles_as_numpy

i am not sure how openjpeg is pulled into the imagecodecs wheel during a build, but i presume openpjeg is being built differently than the conda-forge version. though looking at https://github.com/conda-forge/openjpeg-feedstock/blob/main/recipe/build.sh, there don't seem to be any special build options enabled for the conda-forge version.

kaczmarj commented 1 year ago

building openjpeg with -DCMAKE_BUILD_TYPE=Release solves the problem. i will submit a pull request to https://github.com/Czaki/imagecodecs_build to add this option.

the change should be made in these lines: https://github.com/Czaki/imagecodecs_build/blob/c7abf4b7c91746c30a754e5d3367f6347262e049/build_utils/build_libraries.sh#L361-L364

when openjpeg is not compiled in release mode, it looks like ffast-math is not enabled (see here):

  # Do not use ffast-math for all build, it would produce incorrect results, only set for release:
  set(OPENJPEG_LIBRARY_COMPILE_OPTIONS ${OPENJPEG_LIBRARY_COMPILE_OPTIONS} "$<$<CONFIG:Release>:-ffast-math>")
  set(OPENJP2_COMPILE_OPTIONS ${OPENJP2_COMPILE_OPTIONS} "$<$<CONFIG:Release>:-ffast-math>" -Wall -Wextra -Wconversion -Wunused-parameter -Wdeclaration-after-statement -Werror=declaration-after-statement)
kaczmarj commented 1 year ago

enabling -DCMAKE_BUILD_TYPE=Release in the openjpeg build causes imagecodecs tests to fail... :(

cgohlke commented 1 year ago

Never mind the failures. That repository is out of sync. I build the libraries locally in Docker these days and then build&test the wheels on Azure/GHA...

ap-- commented 1 year ago

New version of tiffslide with a fix is on its way to pypi, and then later today to conda.

Thanks again everyone for the fun debugging session :smiley:

cgohlke commented 1 year ago

building openjpeg with -DCMAKE_BUILD_TYPE=Release solves the problem.

Thank you for finding this. Would you mind trying again with imagecodecs 2023.7.10?

kaczmarj commented 1 year ago

Would you mind trying again with imagecodecs 2023.7.10?

it works! here are the benchmark results on my machine with the most recent tiffslide (8bea5a4c8e1429071ade6d4c40169ce153786d19), tifffile==2023.7.10, and imagecodecs==2023.7.10.

what a triumph!!!

benchmark_read_tiles_as_numpy

ap-- commented 1 year ago

tiffslide==2.2.0 has the fix. (I just added two more commits to update the benchmark stuff)

what a triumph!!!

Thanks again for reporting and investigating ❤️