fuscc-deep-path / sc_MTOP

sc-MTOP is an analysis framework based on deep learning and computational pathology. This framework aims to characterize the tumor ecosystem diversity at the single-cell level. This code provide 1) Hover-Net-based nuclear segmentation and classification; 2) Nuclear morphological and texture feature extraction; 3) Multi-level pairwise nuclear graph construction and spatial topological feature extraction.
26 stars 6 forks source link

Single Cell Morphological and Topological Profiling Based on Digital Pathology

Description

sc-MTOP is an analysis framework based on deep learning and computational pathology. It consists of two steps: 1) Nuclear segmentation and classification; 2) Feature extraction. This framework aims to characterize the tumor ecosystem diversity at the single-cell level. We have established a demo website to show the functions.

This is the official pytorch implementation of sc-MTOP. Note that only the Nuclear segmentation and classification step supports batch processing.

  1. F1_CellSegment.py for nuclear segmentation and classification:

    This step employs HoVer-Net for simultaneous nuclear segmentation and classification. The model is pre-trained based on PanNuke dataset and can be downloaded from url.

    Provide your WSI files as input. We use .ndpi WSI files in our work, and theoretically it supports all WSI file formats allowed by HoVer-Net. The step outputs a .json file including all information on nuclear segmentation and classification for each sample.

  2. F3_FeatureExtract.py for feature extraction:

    This step extracts morphological, texture and topological features for individual tumor, inflammatory and stroma cells, which are the main cellular components of breast cancer ecosystem.

    Provide your WSI files and the corresponding .json files output by the segmentation step as input. It is allowed to define region of interest (ROI) using an .xml annotation file generated by the ImageScope software. For each sample, the feature extraction step outputs a folder containing four .csv data files. For each type of tumor, inflammatory and stroma cells, one .csv files stores the features for all cells belonging to this type and each cell was identified by a unique cell ID together with the centroid’s spatial coordinates. The other .csv file stored the edge information for this sample and characterized each edge by the connected cell IDs.

  3. F4_Visualization.py for visualization:

    We provide an additional function for the visualization of the nuclear segmentation results and nuclear graph.

    Provide the WSI files, the corresponding feature files output by the feature extraction step and an .xml annotation file defining the ROI. The output visualization results will be written in the annotation file and can be viewed using the ImageScope software. Note that ImageScope may fail to open the annotation file once your ROI is too large.

Requirements

Packages and version

The packages required have been provided in the file requirements.txt

Operating systems

The code have been tested in the Windows and Ubuntu 16.04.7 LTS.The installation in the different operation systems may be different because of some packages.

Hardware

The code involves deep learning-based neural network inference, so it needs GPU with more than 8GB video memory. HoVer-Net needs SSD at least 100GB for cache. The requirement of RAM depends on the data size and we suggest that it should be more than 128GB. The code has been tested on GeForce GTX 2080Ti NVIDIA GPU, RAM 128GB.

Installation

To install the environment, you can run the command in the terminal:

pip install -r requirements.txt

The code require package openslide python, but its installation is different between Linux and Windows. Please follow the offical documentation to install and import it in python to make sure it can work correctly. The pre-trained HoVer-Net model is not provided in the source code due to the file size. You can download it following the Description or you can download it in our release.

Repository Structure

Hover: the implementation of HoVer-Net, which is cloned from the official implementation
main.py: main function
F1_CellSegment.py: nuclear segmentation and classification by calling Hover.
F3_FeatureExtract.py: feature extraction by calling WSIGraph.py.
F4_Visualization.py: visualization by calling utils_xml.py.
utils_xml.py: define some tools to finish visualization.
WSIGraph.py: define the process of feature extraction.

Usage Demo

Here is a demo to use it in the bash terminal of Ubuntu. Some commands may not work in different terminal. To run the whole demo, you should get the demo data and pre-train parameter first. Download them with the follow command: Download the pre-train network parameter

wget --no-check-certificate --content-disposition -P ./Hover https://github.com/fuscc-deep-path/sc_MTOP/releases/download/Demo/hovernet_fast_pannuke_type_tf2pytorch.tar

Download the demo data

mkdir -p {wsi,xml,fun_fig}
wget --no-check-certificate --content-disposition -P ./wsi https://github.com/fuscc-deep-path/sc_MTOP/releases/download/Demo/Example001.ndpi
wget --no-check-certificate --content-disposition -P ./xml https://github.com/fuscc-deep-path/sc_MTOP/releases/download/Demo/Example001.xml

Nuclear segmentation and classification -- This step takes almost 2 hours with 2080Ti GPU and SSD.

python main.py segment --input_dir='./wsi' --output_dir='./output'

Feature extraction -- This step takes almost 40 minutes with 128GB RAM and 8 process.

python main.py feature --json_path='./output/json/Example001.json' --wsi_path='./wsi/Example001.ndpi' --output_path='./feature'

Visualization -- output will be in 'fun_fig' directory

python main.py visual --feature_path='./feature/Example001' --wsi_path='./wsi/Example001.ndpi' --xml_path='./xml/Example001.xml'