This repository contains Pytorch implementation of training image representations and performance evaluation of the approach introduced in I. Zingman, B. Stierstorfer, C. Lempp, F. Heinemann. "Learning image representations for anomaly detection: application to discovery of histological alterations in drug development", Medical Image Analysis, 2024. It is also available on ArXiv and temporary has a free access on Elsevier.
The paper develops a method for anomaly detection in whole slide images of stained tissue samples in order to routinely screen histopathological data for abnormal alterations in tissue.
Figure above shows detection of adverse drug reactions by the Boehinger Ingelheim Histological Network (BIHN) based anomaly detection. A: The developed Anomaly Detection (AD) method detects induced tissue alterations in the liver of mouse after administration an experimental compound. The fraction of abnormal tiles increases with the the dosage of the compound. The compound was previously found to have toxic side effects in toxicological screening by pathologists. Each dot corresponds to a single Whole Slide Image (WSI). Three arrows correspond to three WSI examples given in B. Stars on the top of the graph show statistical significance of the change compared to the mean of control group. B: Examples of detected anomalies. In the control group (left image) blood and a few other not pathological structures result in a low level of false positives. Detections in compound treated groups (two right images) correspond to pathological alterations and were confirmed by a pathologist.
Requirements
PyTorch
, NumPy
, Pillow
, scikit-learn
The code in the repository was tested under Python 3.9
with GPU 11GB and packages' listed in the requirements.txt
.
It, however, should also run with earlier Python versions and smaller GPU memory.
Experiments (training image representations and performance evaluation)
Setting up dataset
The training dataset with normal tissue of different species, organs, and staining can be downloaded from data/train/
folder from https://osf.io/gqutd/.
This dataset was used for training image representations.
The evaluation dataset with normal mouse liver tissue and mouse tissue with Non-Alcoholic Fatty Liver Disease (NAFLD) can
be downloaded from data/test/
folder from https://osf.io/gqutd/
Due to large sizes of zip files it is recommended to download each zip file separately.
Create the folder structure shown below under the root folder of your repository with the cloned code or in any other location.
In the last case set _prj_root
variable to the chosen location in configs/cfg_training_cnn.py
and configs/cfg_anomaly_detector.py
configuration files.
We use *.py configuration files, not e.g. yaml, which allows more flexibility and is convenient for prototyping.
Unzip downloaded data files to the corresponding folders within the created folders structure
If you want to use pre-trained models (instead of training yourself)
trained models/
folder from https://osf.io/gqutd/.EfficientNet_B0_320_HE_Liver_Mouse_acc0.9762.pt
, EfficientNet_B0_320_Masson_Liver_Mouse_acc0.9755.pt
CNN models,
the corresponding EfficientNet_B0_320_Masson_Liver_Mouse_acc0.9755.pkl
and EfficientNet_B0_320_HE_Liver_Mouse_acc0.9762.pkl
anomaly detection models (One-cass SVM classifiers), and the corresponding EfficientNet_B0_320_HE_Liver_Mouse_acc0.9762_training_configuration.pkl
and EfficientNet_B0_320_Masson_Liver_Mouse_acc0.9755_training_configuration.pkl
configuration files into e.g. BIHN_models
folder under the project root.Folders structure for project's input
.
├── data
├── test
│ ├── NAFLD_anomaly_he_mouse_liver
│ ├── NAFLD_anomaly_mt_mouse_liver
│ ├── normal_he_mouse_liver
│ └── normal_mt_mouse_liver
└── train
├── he_mouse_brain
├── he_mouse_heart
├── he_mouse_kidney
├── he_mouse_liver
├── he_mouse_lung
├── he_mouse_pancreas
├── he_mouse_spleen
├── he_rat_liver
├── mt_mouse_brain
├── mt_mouse_heart
├── mt_mouse_kidney
├── mt_mouse_liver
├── mt_mouse_lung
├── mt_mouse_pancreas
├── mt_mouse_spleen
└── mt_rat_liver
Training
data_staining
in configs/cfg_training_cnn.py
to either Masson
(Massosn's Trichrome staining) or HE
(H&E staining) values, which will
adjust training image representations for anomaly detection in images of tissue stained correspondingly. If you store the training data in your own location, update
path_to_data
accordingly.python train_cnn.py --config configs/cfg_training_cnn.py
train_results/stamp
folder with trained models (models for each epoch and the best one), confusion matrix, configuration
and log files, where stamp is a unique number that is set for each run. You can redefine the output
folder in the configuration file configs/cfg_training_cnn.py
, if needed, by updating path_to_results
.Evaluation
cnn_model
variable in configs/cfg_anomaly_detector.py
to the relative to root path to the trained CNN model, which was
generated in folder train_results/stamp/model_name.pt
during the training step above. Alternatively, you can set an arbitrary path to the downloaded from https://osf.io/gqutd pre-trained CNN model *.pt
.*.pkl
from https://osf.io/gqutd/, set ad_model
to its location. Alternatively, if you want to train anomaly model on your own (once-class classifier), set ad_model
to empty string ""
or to "CNN_location"
.python anomaly_detector.py --config configs/cfg_anomaly_detector.py
. The code will output evaluation results to test_results
folder.
If anomaly model (once-class classfier) was trained, it will be saved to the folder where CNN model is. Expected performance of anomaly detection with BIHN models
Staining | Balanced accuracy | AU-ROC | F1 score |
---|---|---|---|
H&E | 94.20% | 97.33% | 94.09% |
Masson Trichrome | 97.51% | 99.03% | 97.51% |
anomalib_root/anomalib/models/method/config_file.yaml
.
The paths fields to be set in yaml are normal_dir
, abnormal_dir
, normal_test_dir
, which should point to ./data/train/*mouse_liver/
, ./data/test/NAFLD_anomaly_*_mouse_liver
, ./data/test/normal_*_mouse_liver
data paths correspondingly.
The star in paths refers to a particular staining type, mt
or he
you want to experiment with. The task
field should be set to "classification".Camalyon16Dataset
class, reading images from NAFLD dataset.
We obtained our best results for DPA using camelyon16
wo_pg_unsupervsed
default configuration with the following parameters tuned inner_dims: 16, latent_dim:16
(for both decoder and encoder, same values for all layers as in the default configuration), initial_image_res:256, max_image_res:256, crop_size: 256
.
Batch size was reduced to 64 to be able to run on 256x256 size images.Use of pretrained BIHN models in your own projects
In oder to use pretrained BIHN models (*.pt files that can be downloaded from https://osf.io/gqutd/) to generate
feature representations of histopathological images (Masson or H&E) for your own tasks, you can consult the code example in model_use_example.py
.
Citing
@article{zingman2022anomaly,
title={Learning image representations for anomaly detection: application to discovery of histological alterations in drug development},
author={Igor Zingman and Birgit Stierstorfer and Charlotte Lempp and Fabian Heinemann},
year={2022},
journal={CoRR},
volume={abs/2210.07675},
eprinttype = {arXiv},
url = {https://arxiv.org/abs/2210.07675}
}
@online{NAFLD_dataset,
author = {Igor Zingman and Birgit Stierstofer and Fabian Heinemann},
title = {{NAFLD} pathology and healthy tissue samples},
year = {2022},
url = {https://osf.io/gqutd/},
}