Backend set to tiffslide, but openslide is used

vthorsson commented 1 month ago

Thanks for providing a really useful tool.

I have a wsinfer installation with both openslide and tiffslide. When I select tiffslide on the command line, it looks like my input file is still opened with openslide.

wsinfer --backend tiffslide --log-level debug  run  --wsi-dir slides/    --results-dir results/    --model pancancer-lymphocytes-inceptionv4.tcga
INFO:wsinfer.wsi:Setting backend to tiffslide

Running wsinfer version 0.6.1

....

Traceback (most recent call last):
  File "/Users/vesteinn/miniconda3/envs/wsinfer/lib/python3.12/site-packages/wsinfer/patchlib/__init__.py", line 368, in segment_and_patch_directory_of_slides
    segment_and_patch_one_slide(
  File "/Users/vesteinn/miniconda3/envs/wsinfer/lib/python3.12/site-packages/wsinfer/patchlib/__init__.py", line 106, in segment_and_patch_one_slide
    slide = WSI(slide_path)
            ^^^^^^^^^^^^^^^
  File "/Users/vesteinn/miniconda3/envs/wsinfer/lib/python3.12/site-packages/openslide/__init__.py", line 179, in __init__
    self._osr = lowlevel.open(str(filename))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vesteinn/miniconda3/envs/wsinfer/lib/python3.12/site-packages/openslide/lowlevel.py", line 203, in _check_open
    raise OpenSlideUnsupportedFormatError("Unsupported or missing image file")
openslide.lowlevel.OpenSlideUnsupportedFormatError: Unsupported or missing image file

I can reproduce the error message if I manually open the file using openslide (openslide.OpenSlide), whereas as tiffslide opens it just fine (tiffslide.TiffSlide).

I'm wondering if I am misunderstanding the impact of --backend tiffslide or if this is expected or unexpected behavior. (The input file is an ome tiff, and I have been able to run wsinfer successfully on svs files with this same install. Installed using conda as per instructions)

kaczmarj commented 1 month ago

hi @vthorsson - thanks for reporting this issue. i'm able to reproduce it. i have opened #226 as an attempted fix. i will wait for the continuous integration to run

if you want to try before i merge, please feel free to install from that branch:

python -m pip install git+https://github.com/SBU-BMI/wsinfer.git@fix/issue-225

i wanted to also tell you i greatly enjoyed your 2018 immunity paper on the immune landscape of cancer. could you please tell me, is it possible to calculate tumor mutational burden from supplementary table 1?

vthorsson commented 1 month ago

Thanks for addressing this so quickly @kaczmarj - look forward to trying out the fix!

Thanks for the good words on the Immune Landscape manuscript. You can find the TMB on the associated publication page https://gdc.cancer.gov/about-data/publications/panimmune

vthorsson commented 1 month ago

@kaczmarj from my tests using the install from the branch provided above I get the sense that openslide is being called upon. However the error trace does look a little different from earlier : this time the MPP determination is reported in the trace.

wsinfer --backend tiffslide --log-level debug  run  --wsi-dir slides/    --results-dir results/    --model pancancer-lymphocytes-inceptionv4.tcga
DEBUG:wsinfer.wsi:Set backend to tiffslide

Running wsinfer version 0.6.2.dev9+g3339733

....

INFO:wsinfer.patchlib:Segmenting and patching slide slides/167751.ome.tif
INFO:wsinfer.patchlib:Using prefix as slide ID: 167751.ome
DEBUG:wsinfer.wsi:Attempting to read MPP using OpenSlide
ERROR:wsinfer.patchlib:Failed to segment and patch slide
slides/167751.ome.tif
Traceback (most recent call last):
  File "/Users/vesteinn/miniconda3/envs/wsinfer-tiffslide-only/lib/python3.12/site-packages/wsinfer/patchlib/__init__.py", line 368, in segment_and_patch_directory_of_slides
    segment_and_patch_one_slide(
  File "/Users/vesteinn/miniconda3/envs/wsinfer-tiffslide-only/lib/python3.12/site-packages/wsinfer/patchlib/__init__.py", line 107, in segment_and_patch_one_slide
    mpp = get_avg_mpp(slide_path)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vesteinn/miniconda3/envs/wsinfer-tiffslide-only/lib/python3.12/site-packages/wsinfer/wsi.py", line 272, in get_avg_mpp
    mppx, mppy = _get_mpp_openslide(slide_path)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vesteinn/miniconda3/envs/wsinfer-tiffslide-only/lib/python3.12/site-packages/wsinfer/wsi.py", line 125, in _get_mpp_openslide
    slide = openslide.OpenSlide(slide_path)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vesteinn/miniconda3/envs/wsinfer-tiffslide-only/lib/python3.12/site-packages/openslide/__init__.py", line 179, in __init__
    self._osr = lowlevel.open(str(filename))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vesteinn/miniconda3/envs/wsinfer-tiffslide-only/lib/python3.12/site-packages/openslide/lowlevel.py", line 203, in _check_open
    raise OpenSlideUnsupportedFormatError("Unsupported or missing image file")
openslide.lowlevel.OpenSlideUnsupportedFormatError: Unsupported or missing image file

kaczmarj commented 1 month ago

thanks. i am fixing this in another branch now. use this to try it out:

python -m pip install git+https://github.com/SBU-BMI/wsinfer.git@fix/read-mpp-with-backend

when i originally wrote get_avg_mpp, i meant it to be a general function to read mpp, using any and all backends. but it makes more sense now to use only the backend specified by the user (if that fails, i fall back to tifffile).

vthorsson commented 1 month ago

Thanks @kaczmarj , things are looking good now!

Run trace says "DEBUG:wsinfer.wsi:Attempting to read MPP using TiffSlide"
The run completes
Generates output json, cvs, thumbnail etc

BTW, this particular ome.tiff was exported from QuPath, from an imported .svs ! (I may check if the wsinfer results are similar ).

Related: Do you have any material posted on requirements/desired characteristics for WSI image files as input for wsinfer?

kaczmarj commented 1 month ago

fantastic! and to answer your last question, we don't have any material posted about requirements/desired characteristics for WSI images.

a few things come to mind however:

there should be a clearly visible different between tissue and glass background. this helps the tissue segmentation algorithm. without clear separation (eg washed out stains), tissue segmentation may fail.
when running a model on an image, consider whether the input image comes from a similar distribution of images as the training set. sometimes a model might not perform well, and one reason could be a distribution shift.
try to have your image on an SSD or other fast storage. the patches are loaded lazily, directly from the image as they are needed. having the slide on an SSD will probably speed up read times considerably.

if you have anything else in mind, please do let me know. thanks again for finding this bug and reporting it!

vthorsson commented 1 month ago

@kaczmarj thanks for the helpful pointers above. I am seeing some signs* of my tissue segmentation step failing possibly due to do high background, so I am working on pre-processing to remove background ( A future feature request: to be able to run only the tissue segmentation and ensure that the thumbnail looks OK)

I apologize in advance as my questions largely seem a consequence of a not-ideal input file, but that led to a few things.

In terms of specific file format questions/comments

My input image is 16-bit (uint16) rather than 8-bit (uint8). Do you know if that is a problem per-se? Or would you expect things to run on 16-bit images just fine (I am seeing some error messages deep in PIL/Image.py that may relate to this)(I have a way to manually convert, but that induces other changes in the file etc.)
In terms of image channels, it looks like RGB is fine for an H&E image, but please let me know if you are aware of an issue around that or what is expected in image channels.
For my TIFF image, I learned during an analysis of my input file that certain tiff image tags are used in wsinfer, e.g XResolution in https://github.com/SBU-BMI/wsinfer/blob/06ef6b8bd56c3ab903760c505a654ce4c2a768f9/wsinfer/wsi.py#L153C1-L154C1. It would be great to know beforehand which tags are needed for the files to be read in an processed (including for MPP). This is maybe thus be a documentation request, but I understand that it could be a pain to extract from the code, and would be hard to produce.

Thanks again

*the thumbnail looks off - it might be the detected 'boundary' is the entire slide

SBU-BMI / wsinfer

Backend set to tiffslide, but openslide is used #225