DIAGNijmegen / pathology-hooknet-tls

MIT License
12 stars 2 forks source link

Resampling svs and creating pyramidal tiff files with pyvips #9

Open mpl24 opened 5 months ago

mpl24 commented 5 months ago

Hi,

Thank you for putting together this model and sharing the data! I want to run this TLS HookNet on my own data which is aperio pyramidal svs files. However, when I try to resample the images to 0.5 mpp and 2.0 mpp and save into pyramidal tiff using pyvips I am running into issues. I can successfully save the pyramidal tiff images, but then when I check that there are multiple levels in openslide, it is showing as one level image. I also try running the HookNet model on them and it gives me the same error from issue 4: https://github.com/DIAGNijmegen/pathology-hooknet-tls/issues/4

Here is my current setup for converting svs to downsampled pyramidal tiff.

import openslide
import pyvips
import tifffile

output_path = './downsampled.tif'
slide = openslide.OpenSlide('file.svs')
mpp_x = float(slide.properties['openslide.mpp-x'])
mpp_y = float(slide.properties['openslide.mpp-y'])
assert mpp_x == mpp_y
svs_img = pyvips.Image.new_from_file('file.svs', access = "sequential")
downsampled_05 = svs_img.resize(mpp_x/0.5)
downsampled_2 = svs_img.resize(mpp_x/2.0)
image = pyvips.Image.arrayjoin([downsampled_05, downsampled_2], across = 1)
image.write_to_file(
    output_path,
    pyramid = True,
    Q = 95,
    tile = True,
    compression = 'lzw',
    bigtiff = True)

Please advise on how you created these downsampled images from TCGA svs files.

martvanrijthoven commented 5 months ago

Dear Mara Pleasure,

I think it is best to create a dense pyramidal tif. I have created a script for this: https://github.com/DIAGNijmegen/pathology-whole-slide-data/blob/main/scripts/save_image_at_spacing.py

Could you please try to use that script? I you have any questions or any issues with it, please let me know and i am happy to help or fix the problem.

Best wishes, Mart

mpl24 commented 5 months ago

Hi Mart,

Great! Looks like that worked for loading the image with asap as backend. Thank you for the help there!

Is there some way you are creating the tissue masks though? Do the masks also need to be in pyramidal format? I saw you use an algorithm for tissue extraction from grand challenge in prior issues. I already have downsampled tissue masks for my slides but they are very downsampled and non-pyramidal format.

I am currently getting this error when running the code in the docker container.

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/hooknet_tls/hooknettls/__main__.py", line 8, in <module>
    objects = build_config(config_reader.read()["default"])
  File "/usr/local/lib/python3.8/dist-packages/dicfg/factory.py", line 124, in build_config
    return _ObjectFactory(deepcopy(config)).build_config()
  File "/usr/local/lib/python3.8/dist-packages/dicfg/factory.py", line 26, in build_config
    return self._build(self._configuration)
  File "/usr/lib/python3.8/functools.py", line 912, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dicfg/factory.py", line 38, in _build_dict
    config[key] = self._build_object(value)
  File "/usr/local/lib/python3.8/dist-packages/dicfg/factory.py", line 67, in _build_object
    return attribute(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/wholeslidedata/iterators/patchiterator.py", line 46, in create_patch_iterator
    commander = commander_class(
  File "/usr/local/lib/python3.8/dist-packages/wholeslidedata/buffer/patchcommander.py", line 50, in __init__
    self.reset()
  File "/usr/local/lib/python3.8/dist-packages/wholeslidedata/buffer/patchcommander.py", line 68, in reset
    messages = self.get_patch_messages()
  File "/usr/local/lib/python3.8/dist-packages/wholeslidedata/buffer/patchcommander.py", line 91, in get_patch_messages
    self._mask = WholeSlideImage(self._mask_path, backend=self._backend, auto_resample=True)
  File "/usr/local/lib/python3.8/dist-packages/wholeslidedata/image/wholeslideimage.py", line 35, in __init__
    self._backend = get_backend(backend)(path=self._path)
  File "/usr/local/lib/python3.8/dist-packages/wholeslidedata/interoperability/asap/backend.py", line 15, in __init__
    raise ValueError(f"cant open image {path}")
ValueError: cant open image ./tiff_files/tiff_masks/image_1.tif
martvanrijthoven commented 5 months ago

Dear Mara Pleasure,

Ah yes indeed you need pyramidal mask files. If you have already downsampled masks, you can try to upscale and save them to a pyramidal tiff with vips. But you will need to set the spacing/mpp information manually.

Best wishes, Mart

mpl24 commented 5 months ago

Does the mpp have to be 0.5 and 2.0 like the slide images?

martvanrijthoven commented 5 months ago

It should be a pyramidal tiff file but the first level can start at a downsampled magnification, e.g., ~2.0mpp. However the dimensions should then match with the dimension of the image at the same mpp.

Please let me know if you have any other questions.

martvanrijthoven commented 5 months ago

Eg if you have numpy mask you can save it with pyvips to a pyramidal mask tiff file with this code:

import pyvips

up_scale_factor = 4
output_path = "./output.tif"

mask = ... # np.array eg. @ 8.065408928247 mpp
pv_img = pyvips.Image.new_from_array(mask)
# upscale to 2.01635223206175 mpp 
pv_img_upscaled = pv_img.resize(up_scale_factor, kernel=pyvips.enums.Kernel.NEAREST)
pv_img_upscaled.write_to_file(output_path, pyramid=True, tile=True, compression="lzw", xres=1000.0 / 2.01635223206175, yres=1000.0 / 2.01635223206175, bigtiff=True)
mpl24 commented 5 months ago

Okay that makes sense! Ill remake the upsampled masks then to have pyramidal format.

mpl24 commented 5 months ago

Hi! Thank you for all the help Mart. So I am able to get the code running in the docker using my images, and I adjusted the shared memory size based on a previous issue when the code was getting hung after starting, but now I am running into a segmentation fault issue. Do you know if the model works on very large wsi?

docker run command: docker run -u $(id -u):$(id -g) -it --shm-size=4G -v ./pathology-hooknet-tls:/hooknet_tls hooknet-tls

this is what the output looks like until it fails due to segmentation fault

2024-06-11 15:53:22.634466: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (34)
2024-06-11 15:53:22.634520: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2024-06-11 15:53:22.634933: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
loading weights... ./weights.h5
Create output folder
Creating lock file: output/images/image_1hooknettls.lock
Run inference
Init writers...
write: /home/user/tmp/image_1_hooknettls.tif
Creating: /home/user/tmp/image_1_hooknettls.tif....
Spacing: 0.3955959172156676
Dimensions: (46171, 38250)
Tile_shape: (1024, 1024)
Failed to open TIFF file for writing
write: /home/user/tmp/image_1_hooknettls_heat1.tif
Creating: /home/user/tmp/image_1_hooknettls_heat1.tif....
Spacing: 0.3955959172156676
Dimensions: (46171, 38250)
Tile_shape: (1024, 1024)
Failed to open TIFF file for writing
write: /home/user/tmp/image_1_hooknettls_heat2.tif
Creating: /home/user/tmp/image_1_hooknettls_heat2.tif....
Spacing: 0.3955959172156676
Dimensions: (46171, 38250)
Tile_shape: (1024, 1024)
Failed to open TIFF file for writing
Applying...
  5%|████▌                                                                                              | 5/108 [00:14<04:48,  2.80s/it]Segmentation fault (core dumped)
martvanrijthoven commented 5 months ago

Dear Mara Pleasure,

You are very welcome!

Yes the model/docker should work with very large wsi.

The error that you get often happens when tiff writing is failing because the output folder does not exists. However the dockerfile creates the output folder /home/user/tmp, so it should be fine.

Can you try to run the docker in interactive mode and check if that folder exists and then run the following command inside the docker (please replace the image and mask path, with your paths):

python3 -m hooknettls \
    hooknettls.default.image_path="your_image.tiff" \
    hooknettls.default.mask_path="your_mask.tiff"
mpl24 commented 5 months ago

Great thank you for the quick reply!

The issue was a permissions thing with my docker, I had to run as root.

The code doesn't have an issue running now and completes all the iterations. However, this error is occurring now. Does this error just mean no TLSs were found in the provided tiff?

Failed to open TIFF file for writing Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/hooknet_tls/hooknettls/main.py", line 8, in objects = build_config(config_reader.read()["default"]) File "/usr/local/lib/python3.8/dist-packages/dicfg/factory.py", line 124, in build_config return _ObjectFactory(deepcopy(config)).build_config() File "/usr/local/lib/python3.8/dist-packages/dicfg/factory.py", line 26, in build_config return self._build(self._configuration) File "/usr/lib/python3.8/functools.py", line 912, in _method return method.get(obj, cls)(*args, *kwargs) File "/usr/local/lib/python3.8/dist-packages/dicfg/factory.py", line 38, in _build_dict config[key] = self._build_object(value) File "/usr/local/lib/python3.8/dist-packages/dicfg/factory.py", line 67, in _build_object return attribute(args, **kwargs) File "/hooknet_tls/hooknettls/gc.py", line 16, in save_to_gc_outputs write_mask(wsi, wsa, wsi.spacings[2], output_folder=Path('/output/images/tls-gc/'), suffix='_hooknet_tls.tif') File "/usr/local/lib/python3.8/dist-packages/wholeslidedata/interoperability/asap/imagewriter.py", line 295, in write_mask raise ValueError(f"No values have been written to {mask_output_path}") ValueError: No values have been written to /output/images/tls-gc/image_1_hooknettls_post_processed_hooknet_tls.tif

martvanrijthoven commented 5 months ago

Dear Mara Pleasure,

Great to hear that it now completed all the iterations!

The error that you get is indeed occurring because no tls/gc objects are found and the pipeline is now trying to create the required outputs for the grand-challenge platform. I should program this in a better way and will fix this soon.

For now you can comment out the the following lines in the config file which will disable the creation of the grand-challenge outputs.

https://github.com/DIAGNijmegen/pathology-hooknet-tls/blob/2779f4b13eb93f8adac4fa72668198d894eb75be/hooknettls/configs/config.yml#L103-L106

I hope after this change you can successfully run the algorithm on all your images.

Best wishes, Mart

mpl24 commented 5 months ago

Hi Mart,

Thank you for all the help so far!

I wanted to check in, I have been able to run the model on my images because of your help!

However, no TLSs were detected, and we know there are some in our images based on talking with a clinician. As a sanity check, I reran 30 TCGA cases from the dataset you trained the model (10 from each cancer type) to see if it was an issue with my preprocessing. Based on this sanity check, it seems my preprocessing is causing the issue since no TLSs were detected for all 30 TCGA cases.

I am confused about where it could be going wrong, though; I am using the svs to tiff conversion and downsampling script as you suggested above. I am then making pyramidal tissue masks using this code based on what you provided.

slide = openslide.OpenSlide(slide_p)
max_mag = int(slide.properties['openslide.objective-power'])
mask = np.array(Image.open(mask_p))
orig_dimensions = ((np.array(slide.dimensions)[::-1])/2).astype(int) # 20x dimensions
downsample = np.mean(orig_dimensions/mask.shape)

downsample_goal = 8/0.5

print(f'Mask shape orig: {mask.shape}')

## upsample mask to 8mpp dimension
mask = resize(
    mask,
    orig_dimensions/downsample_goal,
    mode = 'edge',
    anti_aliasing = False,
    anti_aliasing_sigma = None,
    preserve_range = None,
    order = 0
)
print(f'Mask shape upscaled: {mask.shape}')

upscale = 4
pv_img = pyvips.Image.new_from_array(mask)
pv_img_upscaled = pv_img.resize(upscale, kernel = pyvips.enums.Kernel.NEAREST)
pv_img_upscaled.write_to_file(
    f'{args.save_root}/{slide_name}.tif',
    pyramid = True,
    Q = 95, 
    tile = True,
    compression = 'lzw',
    xres = 1000.0/0.5,
    yres = 1000.0/0.5,
    bigtiff = True
    )

Do you know what could be going wrong with my preprocessing then? Does the model expect anything besides a binary tissue mask? I created these tissue masks with my own thresholding code and I still do not have access to the Grand Challenge tissue segmentation code to compare since my request is still pending.

I also double checked that the 30 TCGA slides I sampled all have TLSs based on annotations you provided.

Thank you again!

Mara

martvanrijthoven commented 5 months ago

Dear Mara,

That is indeed weird and it sounds that the model should work and predict TLS.

Would it be possible to share your converted TCGA + mask file with me (mart.vanrijthoven@gmail.com). Then I will run and try to figure out what is going wrong.

Best wishes, Mart

mpl24 commented 5 months ago

Yes, I can send you a couple of examples of converted TCGA and mask files.

Thank you for the help!

Best, Mara