DigitalSlideArchive / HistomicsTK

A Python toolkit for pathology image analysis algorithms.
https://digitalslidearchive.github.io/HistomicsTK/
Apache License 2.0
390 stars 116 forks source link

Immune structure/folds/bubbles recognition #793

Closed asmagen closed 4 years ago

asmagen commented 4 years ago

What would be the most straightforward method to identify tissue areas with unique cell organization patterns, such as dense immune aggregates or vascular areas, as well as artifacts such as bubbles or folds? Would the tissue boundary recognition with more iterations solve this or would it require an independent classification model per pixel using a CNN for example?

Thanks

kheffah commented 4 years ago

@asmagen I would say you could start by trying to play around a bit with the thresholds for the cellularity detection module:

https://github.com/DigitalSlideArchive/HistomicsTK/blob/master/docs/examples/cellularity_detection_thresholding.ipynb

The default threshold values are fairly good at picking up blood and blue/green sharpie or inking, at least in TCGA. These depend on the color properties, and seem to be fairly robust.

If this doesn't work, you may try to play around and perhaps edit the superpixel-based approach, where the slide is decomposed into superpixels and its intensity and texture features are modeled as a mixture of gaussians:

https://github.com/DigitalSlideArchive/HistomicsTK/blob/master/docs/examples/cellularity_detection_superpixels.ipynb

Although to be honest, the superpixel approach requires a lot of tuning and may not be that easy to get to work robustly. Color thresholding is more robust in my experience.

If both of these fail, then perhaps you could try some edge detection or to use some existing frameworks for artifact detection like HistoQC: https://github.com/choosehappy/HistoQC

If you have training data, then a CNN may be ideal for this kind of work as it doesn't require as much tuning and is usually quite robust. You may use the training dataset we described in our paper:

https://github.com/CancerDataScience/CrowdsourcingDataset-Amgadetal2019

which does contain training data for immune infiltrates, blood pools, and small blood vessels in breast cancer.

kheffah commented 4 years ago

I'm assuming this addresses the question and closing. Feel free to re-open.