DIAGNijmegen / pathology-whole-slide-data

A package for working with whole-slide data including a fast batch iterator that can be used to train deep learning models.
https://diagnijmegen.github.io/pathology-whole-slide-data/
Apache License 2.0
92 stars 27 forks source link

Tissue masking #26

Closed michelbotros closed 2 years ago

michelbotros commented 2 years ago

Hi Mart,

I was wondering if there is anything available yet for tissue masking. I think there is quite some use cases for tissue masking. For example:

  1. When working with WSI's without annotations. Still being able to sample patches from tissue regions from the WSI would be a nice to have.
  2. To exclude annotations that are outside of tissue regions. In my project I'm dealing with ''lazy annotations'' where at the border of the biopsy some non tissue is often included in the annotated region to save time during the annotation process (see example below).

It might be beneficial to perform tissue masking before patch extraction for the first use case. Let me know what you think and if there is already options for this!

Best,

Michel

michelbotros commented 2 years ago

example_lazy_annotations_3

martvanrijthoven commented 2 years ago

Dear Michel,

There are two options available already.

The first one is by using the MaskAnnotationParser This parser converts a mask into multiple rectangle regions for which the size can be specified.

Another option, which I would recommend, is to create polygons from the tissue mask with this function With the Shapely library, you can intersect the tissue mask polygons and the annotated polygons. From the intersection you create a new annotation file, e.g., with write_asap_annotation or use the internal JSON representation of wholeslidedata convert_annotations_to_json. In this way you get a cleaned annotation file which you can use for training.

I hope this helps and please let me know if anything is unclear.

Best wishes, Mart

michelbotros commented 2 years ago

Dear Mart,

Thanks for the suggestions. I think the second option, where a clean annotation file is created, is indeed preferred. If I understand it correctly we then find the intersection between the tissue region and the annotation region on polygon level.

This function converts a tissue mask (Numpy) to polygon format and assumes that I have already obtained a tissue mask as well, right? What would you recommend to obtain the tissue masks themselves?

Best,

Michel

martvanrijthoven commented 2 years ago

Hi Michel,

Yes, indeed the intersection can then be done on polygon level. And yes you can use that function to convert a NumPy array to polygons. You can use a spacing value of ~4.0 or ~8.0 (.get_slide) when opening the mask (please use the asap backend, because openslide backend does not work well with monochrome images) and upscale the polygons.

For creating tissue masks I recommend using this algorithm

Best wishes, Mart

michelbotros commented 2 years ago

Hi Mart,

Okay, all clear! Thank you very much for the explanation.

Best,

Michel

michelbotros commented 2 years ago

Hi Mart,

I requested access to the algorithm that you mentioned, but its still pending. Do you perhaps know who I would have to contact to get permission?

Best,

Michel

martvanrijthoven commented 2 years ago

Hi Michel,

If it is still pending, maybe you can contact Peter Bandi he has created this algorithm.