djimenezsanchez / NaroNet

Trained only with subject-level labels, NaroNet discovers phenotypes, neighborhoods, and areas with the highest influence when classifying subject types.
GNU Affero General Public License v3.0
15 stars 7 forks source link

NaroNet: discovery of tumor microenvironment elements from highly multiplexed images.

Summary: NaroNet is an end-to-end interpretable learning method that can be used for the discovery of elements from the tumor microenvironment (phenotypes, cellular neighborhoods, and tissue areas) that have the highest predictive ability to classify subjects into predefined types. NaroNet works without any ROI extraction or patch-level annotation, just needing multiplex images and their corresponding subject-level labels. See our paper for further description of NaroNet.

© [Daniel Jiménez Sánchez] - This code is made available under the GNU GPLv3 License and is available for non-commercial academic purposes.

Index (the usage of this code is explained step by step)

Requirements and installationPreparing datasetsPreparing parameter configurationPreprocessingPatch Contrastive LearningNaroNetBioInsightsDemoCite

Requirements and installation

To install NaroNet we recommend creating a new anaconda environment with Pytorch (v.1.4.0 or newer). For GPU support, install the versions of CUDA that are compatible with Pytorch's versions.

conda create --name NaroNet python=3.8

Once inside the created environment, install pytorch and pytorch-geometric:

conda install pytorch torchvision torchaudio torchvision cudatoolkit=11.3 -c pytorch
conda install pyg -c pyg

Now you can install the following libraries using pip:

pip install hyperopt
pip install xlsxwriter
pip install matplotlib
pip install seaborn
pip install imgaug
sudo apt-get install python3-opencv 
pip install tensorboard
pip install openTSNE
pip install openpyxl

Preparing datasets

Create the target folder (e.g., 'DATASET_DATA_DIR') with your image and subject-level information using the following folder structure:

DATASET_DATA_DIR/
    └──Raw_Data/
        ├── Images/
                ├── image_1.tiff
                ├── image_2.tiff
                └── ...
    ├── Masks/
                ├── image_1.tiff
                ├── image_2.tiff
                └── ...
        └── Experiment_Information/
                ├── Channels.txt                
                ├── Image_Labels.xlsx
        └── Patient_to_Image.xlsx (Optional)

In the 'Raw_Data/Images' folder we expect multiplex image data consisting of multi-page '.tiff' files with one channel/marker per page. In the 'Raw_Data/Masks' folder put masks with the same size as the images with the same name, with 1's for the pixels that should be analyzed and 0's for the pixels that should be ignored. In the 'Raw_Data/Experiment_Information' two files are expected:

Image_Names Control vs. Treatment Survival
image_1.tiff Control Poor
image_2.tiff None High
image_3.tiff Treatment High
... ... ...
Image_Name Subject_Name
image_1.tiff subject_1
image_2.tiff subject_1
image_3.tiff subject_2
... ... ...

Preparing parameter configuration

In the following sections (i.e., preprocessing, PCL, NaroNet, BioInsights) several parameters are required to be set. Although parameters will be explained in each section, all of them should be specified in the file named 'DatasetParameters.py', which is located in the folder 'NaroNet/src/utils'. Change it to your own configuration, where 'DATASET_DATA_DIR' is your target folder. See example file or example below:

def parameters(path, debug):
    if 'DATASET_DATA_DIR' in path:        
        args['param1'] = value1
    args['param2'] = value2
    ...     

Patch Contrastive Learning

The goal of PCL in our pipeline is to convert each high-dimensional multiplex image of the cohort into a list of low-dimensional embedding vectors. To this end, each image is divided into patches -our basic units of representation containing one or two cells of the tissue-, and each patch is converted by the PCL module -a properly trained CNN- into a low-dimensional vector that embeds both the morphological and spectral information of the patch.

To this end, 'NaroNet.patch_contrastive_learning' function is used with the following parameters:

When executed, PCL checks whether a trained CNN is already in a previously created folder named 'Model_Training_xxxx', where xxxx are random letters. In case the folder does not exist, PCL creates a new model, stores it in a new 'Model_Training_xxxx' folder, and trains it using the parameter configuration. To check whether the CNN has been trained successfully, check the 'Model_training_xxxx' folder and open the 'Contrast_accuracy_plot.png', where you should expect a final contrast accuracy value over 50%.

Once the CNN is trained, execute again 'NaroNet.preprocess_images' to infer image patch representations from the whole dataset. Here, image patches are introduced in the CNN sequentially getting representation vectors back. For each image in the dataset, a npy data structure is created consisting of a matrix, where rows are patches, and columns are representation values. Here, the two first column values specify the x and y position of the patch in the image that will be later used to create a graph. In case Patient_to_Image.xlsx exists the npy structure will contain patches from more than one image.

Once executed you should expect the following folder structure, where Model_Training_xxxx is created during training, and Image_Patch_Representation during inference (in green):

DATASET_DATA_DIR/
    ├── Raw_Data/        
        └── ...
    └── Patch_Contrastive_Learning/
    ├── Preprocessed_Images/            
        └── ...
+   ├── Model_Training_xxxx/
+           ├── model.ckpt-0.index
+       ├── model.ckpt-0-meta
+       ├── model.ckpt-0.data-00000-of-00001
+       ├── event.out.tfevents...
+       ├── checkpoint
+       └── ...
+   └── Image_Patch_Representation/
+           ├── image_1.npy
+       ├── image_2.npy
+       └── ...

NaroNet

NaroNet inputs graphs of patches (stored in 'DATASET_DATA_DIR/Patch_Contrastive_Learning/Image_Patch_Representation') and subject-level labels (stored in 'DATASET_DATA_DIR/Raw_Data/Experiment_Information/Image_Labels.xlsx') to output subject's predictions from the abundance of learned phenotypes, neighborhoods, and areas. To this end, execute 'NaroNet.NaroNet.run_NaroNet' with the following parameters (most relevant parameters are shown, where additional ones are explained in DatasetParameters.py):

When executed, NaroNet selects the images included in the experiment and creates a graph of patches in the pytorch's format .pt. Next, k-fold-cross validation is carried out, training the model with 90% of the data and testing in the remaining 10%. See in green all folders created:

DATASET_DATA_DIR/
    ├── Raw_Data/        
        └── ...
    ├── Patch_Contrastive_Learning/     
    └── ...
+   └── NaroNet/        
+   ├── Survival/ (experiment name example)
+       ├── Subject_graphs/
+               ├── data_0_0.pt
+               ├── data_1_0.pt
+           └── ...
+       ├── Cell_type_assignment/
+               ├── cluster_assignment_Index_0_ClustLvl_10.npy (phenotypes)
+               ├── cluster_assignment_Index_0_ClustLvl_11.npy (neighborhoods)
+               ├── cluster_assignment_Index_0_ClustLvl_6.npy (areas)
+               ├── cluster_assignment_Index_1_ClustLvl_10.npy (phenotypes)
+               ├── cluster_assignment_Index_1_ClustLvl_11.npy (neighborhoods)
+               ├── cluster_assignment_Index_1_ClustLvl_6.npy (areas)
+           └── ...
+       └── Cross_validation_results/
+           ├── ROC_AUC_Survival.png
+           ├── ConfusionMatrix_Survival.png
+           └── ...

BioInsights

NaroNet's learned phenotypes, neighborhoods, and areas (stored in 'Cell_type_assignment'), can be analyzed a posteriori by the BioInsights module. Here, elements of the tumor microenvironment are extracted, visualized, and associated to subject types. Execute 'NaroNet.NaroNet_dataset.get_BioInsights' with the same parameters as done in the NaroNet module to automatically generate the following folders:

DATASET_DATA_DIR/
    ├── Raw_Data/        
        └── ...
    ├── Patch_Contrastive_Learning/     
    └── ...
    ├── NaroNet/        
    └── Survival/ (experiment name example)
        └── ...
+   └── BioInsights/
+       └── Survival/ (experiment name example)
+       ├── Cell_type_characterization/
+           └── ...
+       ├── Cell_type_abundance/
+           └── ...
+       ├── Differential_abundance_analysis/
+           └── ...
+       └── Locate_TME_in_image/
+           └── ...

Demo

We provide an example workflow via Jupyter notebook that illustrate how this package can be used.

Experiment name Example Image Dataset link Run in google colab
Discover tumoral differences between patient types (POLE gene mutated vs. POLE gene non-mutated) Endometrial cancer tissue example (download Example_POLE.zip). Open In Colab

Citation

Please cite this paper in case our method or parts of it were helpful in your work.

@article{jimenez2021naronet,
  title={NaroNet: Discovery of tumor microenvironment elements from highly multiplexed images},
  author={Jiménez-Sánchez, Daniel and Ariz, Mikel and Chang, Hang and Matias-Guiu, Xavier and de Andrea, Carlos E and Ortiz-de-Solórzano, Carlos},
  journal={Medical image analysis, vol. 78 102384.},
  year={2022}
}