bgilbert / anonymize-slide

Delete the label from a whole-slide image
GNU General Public License v2.0
57 stars 45 forks source link

NDPI WSIs remain identifiable by image metadata #12

Open ardalana opened 3 months ago

ardalana commented 3 months ago

Working with the anonymize-slide.py code, I realized that the unlabelled output WSI still bears the original name/label in the image metadata, and that metadata is visible e.g. in the “Reference” field of the “Image Info” window in NDP.view2. Therefore, even though the tiff label gets removed from the WSI, the slide remains un-anonymized as it is still identifiable by the image metadata. While this may be a more recent issue than the time of the initial release of the code, I was wondering if there could be an straight-forward solution to this issue so the current NDPI slide images can be completely anonymized.

tomi-lilja commented 3 months ago

Hi @ardalana, yes that is correct, and the Reference tag actually lurks in all the tiff directories of the ndpi file. I have used tifftools (https://pypi.org/project/tifftools/) to remove both the label image and all the Reference tags. I just now put it all in a blog post, and there is an example Python script in the end of the article to perform the full anonymization for ndpi scans: https://scribesroom.wordpress.com/2024/03/15/anonymizing-ndpi-slide-scans/ Tifftools might be a bit slower than writing directly the existing file, as tifftools creates a separate output file, but nevertheless it does the trick.

ardalana commented 3 months ago

Hi @tomi-lilja, many thanks for pointing me to tifftools. It was very helpful and I was able to do everything that I meant to do using it. The good part is that it is possible to get the Reference field completely erased from all IFDs. That was great!