The source code of paper in Medical Image Analysis A Semi-Supervised Multi-Task Learning Framework for Cancer Classification with Weak Annotation in Whole-Slide Images
A semi-supervised multi-task learning (SSMTL) framework for cancer classification.
Our framework consists of a backbone feature extractor, two task-specific classifiers, and a weight control mechanism.
The backbone feature extractor is shared by two task-specific classifiers, such that the interaction of CRD and subtyping tasks can be captured.
The weight control mechanism preserves the sequential relationship of these two tasks and guarantees the error back-propagation from the subtyping task to the CRD task under the MTL framework.
We train the overall framework in a semisupervised setting, where datasets only involve small quantities of annotations produced by our minimal point-based (min-point) annotation strategy.
./preprocess/back_ground_filter.py
../preprocess/extract_patches.py
.Set up pathes of labeled, unlabeled training data and valid data in line 78-80 of multitask_train.py
.
For example:
labeled_data_files = "./data/RCC/labeled_2000_train.txt"
unlabeled_files = "./data/RCC/unlabeled_2000_train.txt"
test_files = "./data/RCC/all_2000_test.txt"
Run python multitask_train.py --gpu 0,1 --epochs 200 --batch-size 128 --n-classes1 2 --n-classes2 3 --out your_path_to_save
.
n-classes1
and n-classes2
represent the number of classes for CRD and subtyping classifier, respectively.Set up pathes of test data, model and results (you want to save).
For example:
test_files = "./data/RCC/all_2000_test.txt"
file_path_base = "./path_to_model/model_best.pth.tar"
results_file = "./data/results.csv"
Run python valid.py
Use the corresponding jupyter notebooks (Evaluation_SSMTL_CRD.ipynb
and Evaluation_SSMTL_Subtyping.ipynb
) to evaluation the cancer region detection and subtyping tasks, respectively.
If any part of this code is used, please give appropriate citation to our paper.
BioMedical Semantic Understanding Group, Xi'an Jiaotong University.
This project is licensed under the MIT License - see the LICENSE file for details.
The datasets used are in whole or part based upon data generated by the TCGA Research Network.
We have great thanks to the implementation of MixMatch by YU1ut.