Biooptics2021 / PathFinder

GNU General Public License v3.0
42 stars 4 forks source link

PathFinder: AI bsaed pathological biomarker finder

Project Page | Paper | Research Highlight

Note: Try Quick Discovery to implement PathFinder with the pre-trained network.

© This code is made available for non-commercial academic purposes.

Contents

Overview

Tissue biomarkers are crucial for cancer diagnosis, prognosis assessment and treatment planning. However, there are few known biomarkers that are robust enough to show true analytical and clinical value. Deep learning (DL)-based computational pathology can be used as a strategy to predict survival, but the limited interpretability and generalizability prevent acceptance in clinical practice. Here we present an interpretable human-centric DL-guided framework called PathFinder (Pathological-biomarker-finder) that can help pathologists to discover new tissue biomarkers from well-performing DL models. By combining sparse multi-class tissue spatial distribution information of whole slide images with attribution methods, PathFinder can achieve localization, characterization and verification of potential biomarkers, while guaranteeing state-of-the-art prognostic performance. Using PathFinder, we discovered that spatial distribution of necrosis in liver cancer, a long-neglected factor, has a strong relationship with patient prognosis. We therefore proposed two clinically independent indicators, including necrosis area fraction and tumour necrosis distribution, for practical prognosis, and verified their potential in clinical prognosis according to criteria derived from the Reporting Recommendations for Tumor Marker Prognostic Studies. Our work demonstrates a successful example of introducing DL into clinical practice in a knowledge discovery way, and the approach may be adopted in identifying biomarkers in various cancer types and modalities.

For more details, please see our paper: "Deep learning supported discovery of biomarkers for clinical prognosis of liver cancer (2023)".

Framework of PathFinder

Directory Structure

PathFinder
    └──WSI_decoupling
          ├── decoupling.py
          ├── inference.py
          ├── visualization.py
          └── PaSegNet
                ├── train.py
                └── test.py
    ├── Prognosis
          ├── data_loaders.py
          ├── train_TCGA_CV.py
          ├── train_TCGA_test_QHCG.py
          ├── train_test.py
          ├── utils.py
          ├── Data_prepare
                ├── cut_heatmap.py
                └── Generate_prognostic_patches.py
          └── Networks
                ├── M2M_network.py
                ├── Macro_networks.py
                └── Micro_networks.py
    ├── Discovery
          ├── data_loaders.py
          ├── networks.py
          ├── attribution.ipynb
          ├── verification.ipynb
          ├── ckpt
                └── trained_model.pth
          └── segmap
                └── segmap_example.npy
    └──Data
          └── WSIs and clinical information

Pre-requisites and Environment

Our Environment

  1. To try out the Python code and set up environment, please activate the pathfinder environment first:

    $ conda activate pathfinder
    $ cd PathFinder/
  2. For ease of use, you can just set up the environment and run the following:
    $ pip install -r requirements.txt

Data Preparation

Data Format

Generate Macro Mode

Generate Micro Mode

Data Usage

Data Distribution

DATA_ROOT_DIR/
    └──DATASET_DIR/
         ├── clinical_information                       + + + 
                ├── Hospital_1.csv                          +
                ├── Hospital_2.csv                          +
                └── ...                                     +
         ├── WSI_data                                       +
                ├── Hospital_1                              +
                       ├── slide_1.svs                      +
                       ├── slide_2.svs                Source Data
                       └── ...                              +
                ├── Hospital_2                              +
                       ├── slide_1.svs                      +
                       ├── slide_2.svs                      +
                       └── ...                              +
                └── ...                                 + + +
         ├── macro_mode                                 + + +
                ├── Hospital_1                              +
                       ├── slide_1_heatmaps.npy             +
                       ├── slide_2_heatmaps.npy             +
                       └── ...                              +
                ├── Hospital_2                              +
                       ├── slide_1_heatmaps.npy             +
                       ├── slide_2_heatmaps.npy             +
                       └── ...                              +
                └── ...                                     +
         └── micro_mode                            Processed Data
                ├── Hospital_1                              +
                       ├── slide_1                          +
                              ├── patch_1.tif               +
                              ├── patch_2.tif               +
                              └── ...                       +
                       ├── slide_2                          +
                              ├── patch_1.tif               +
                              ├── patch_2.tif               +
                              └── ...                       +
                       └── ...                              +
                └── ...                                 + + +             

DATA_ROOT_DIR is the base directory of all datasets (e.g. the directory to your SSD). DATASET_DIR is the name of the folder containing data specific to one experiment.

Training and Evaluation

K-fold Cross Validation

After data preparation, MacroNet can be trained and tested on TCGA data in a k-fold cross-validation by calling:

$ cd ./Prognosis
$ python train_TCGA_CV.py

Independent Hospital Test

The generalization ability of MacroNet can be tested by calling:

$ cd ./Prognosis
$ python train_TCGA_test_QHCG.py

Training and Evaluation of MicroNet and M2MNet

To train and evaluate MicroNet and M2MNet, import corresponding data loader and network architecture in ./Prognosis/train_test.py. Data loaders can be found in ./Prognosis/data_loaders.py, network architectures can be found in ./Prognosis/Networks.

Biomarker Discovery

Quick Discovery

Verification

Biomarker verification according to REMARK and survival analyses are performed in ./Discovery/verification.ipynb

Acknowledgements

Reference

If you find our work useful in your research or if you use parts of this code please consider citing our paper.