1st place solution for RSNA Screening Mammography Breast Cancer Detection competition on Kaggle
Solution write up: https://www.kaggle.com/competitions/rsna-breast-cancer-detection/discussion/392449
Notes:
Please download those trained models and put in assets/trained/
:
# this assume that kaggle api is installed: https://github.com/Kaggle/kaggle-api
kaggle datasets download -d dangnh0611/rsna-breast-cancer-detection-best-ckpts -p assets/trained
unzip rsna-breast-cancer-detection-best-ckpts.zip -d assets/trained/
rm assets/trained/rsna-breast-cancer-detection-best-ckpts.zip
assets
: contain neccessary data files, trained models
assets/data/
: csv label for external datasets (BMCD and CMMD), breast ROI box annotation in YOLOv5 formatassets/public_pretrains/
: publicly available pretrainsassets/trained/
: trained models, used for winning submissiondatasets/
: where to store datasets (competition + external), expected to contain both raw and cleaned version.
datasets/raw/
: raw version of competion data + all external datasets: BMCD, CDD-CESM, CMMD, MiniDDSM, Vindr. For how to correctly structure datasets, please refer to docs/DATASETS.mddocker/
: Dockerfiledocs/
: documentationssrc/
: contain almost source code for this project
src/roi_det
: for training breast ROI detection model (YOLOX)src/pytorch-image-models
: for training classification model (Convnext-small)src/submit
: code to generate predictions (submission)src/tools
: contain python scripts, bash scripts to prepair datasets, training and convert models,..src/utils
: Utilities for dicom processing,..SETTINGS.json
: define relative paths for IOSETTINGS.json
defines base paths for IO:
RAW_DATA_DIR
: Where to store raw dataset, including both competition dataset and external datasets.PROCESSED_DATA_DIR
: Where to store processed/cleaned datasetsMODEL_CHECKPOINT_DIR
: Store intermediate checkpoints during trainingMODEL_FINAL_SELECTION_DIR
: Where to store final (best) models used for submissionSUBMISSION_DIR
: Where to store final submission/inference resultsASSETS_DIR
: Store trained models, manually annotated datasets/files. This must not be changed and define here for easier looking up only.TEMP_DIR
: Where to store intermediate results/filesThe following machine were used to create the final solution: NVIDIA DGX A100. Most of my experiments can be done using 1-3 A100 GPUs. However, final results can be easily reproduced using a single A100 GPU (40GB GPU Memory).
Refer to docs/DATASETS.md for details on how to correctly setup datasets.
There are some stages to reproduce the entire solutions. I will briefly describe it for easier further understanding.
All the following instructions assume that datasets (competition + external data) are all set up. There are 4 options to reproduce the solutions:
Use trained models
assets/trained
to make predictionsDo not re-train YOLOX, fully reproduce Convnext-small classification models
Re-train all parts (reproduce from scratch)
A YOLOX-nano 416 engine which was optimized for NVIDIA A100 is provided at assets/trained/yolox_nano_416_roi_trt_a100.pth. However, the recommended way is to convert it to TensorRT, optimized for your environment/hardware:
PYTHONPATH=$(pwd)/src/roi_det/YOLOX:$PYTHONPATH python3 src/roi_det/YOLOX/tools/trt.py \
-expn trained_yolox_nano_416_to_tensorrt \
-f src/roi_det/YOLOX/exps/projects/rsna/yolox_nano_bre_416.py \
-c assets/trained/yolox_nano_416_roi_torch.pth \
--save-path assets/trained/yolox_nano_416_roi_trt.pth \
-b 1
Behaviors:
{MODEL_CHECKPOINT_DIR}/yolox_roi_det/trained_yolox_nano_416_to_tensorrt/
../assets/trained/yolox_nano_416_roi_trt.pth
PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/tools/convert_convnext_tensorrt.py --mode trained
Behaviours: Save a 4-folds combined TensorRT engine to ./assets/trained/best_ensemble_convnext_small_batch2_fp32.engine'
.
It takes 5-10 minutes for Kaggle's P100 GPU to finish, but take about 1 hour for A100 GPU (my case).
PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/submit/submit.py --mode trained --trt
Behaviours:
{TEMP_DIR}/pngs/
and expected to be removed once inference done. {SUBMISSION_DIR}/submission.csv
A YOLOX-nano 416 engine which was optimized for NVIDIA A100 is provided at assets/trained/yolox_nano_416_roi_trt_a100.pth. However, the recommended way is to convert it to TensorRT, optimized for your environment/hardware:
PYTHONPATH=$(pwd)/src/roi_det/YOLOX:$PYTHONPATH python3 src/roi_det/YOLOX/tools/trt.py \
-expn trained_yolox_nano_416_to_tensorrt \
-f src/roi_det/YOLOX/exps/projects/rsna/yolox_nano_bre_416.py \
-c assets/trained/yolox_nano_416_roi_torch.pth \
--save-path assets/trained/yolox_nano_416_roi_trt.pth \
-b 1
Behaviors:
{MODEL_CHECKPOINT_DIR}/yolox_roi_det/trained_yolox_nano_416_to_tensorrt/
../assets/trained/yolox_nano_416_roi_trt.pth
python3 src/tools/prepair_classification_dataset.py --num-workers 8 --roi-yolox-engine-path assets/trained/yolox_nano_416_roi_trt.pth
Behaviors:
stage1_images
in each raw dataset directory: {RAW_DATA_DIR}/{dataset_name}/stage1_images
for the intermediate stage.{PROCESSED_DATA_DIR}/classification/
contains 8-bits png images {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_images/
and cleaned label file {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_label.csv
for each dataset.python3 src/tools/cv_split.py
Behaviors: Create new directory and saving csv files in {PROCESSED_DATA_DIR}/rsna-breast-cancer-detection/cv/v2/
python3 src/tools/make_train_bash_script.py --mode fully_reproduce
This will save a file named _train_script_auto_generated.sh
in current directory, which include commands and instructions to train Convnext-small classification models.
To reproduce using single GPU, simply run
sh ./_train_script_auto_generated.sh
This could take 8 days to finish training (around 2 days for each fold).
Or if you have multiple GPUs and want to speed up training, simply follow instructions in the generated train script _train_script_auto_generated.sh
and run each command in parallel using different GPUs. For more details on the training process, take a look at my write up, part 4.3.Training
Behaviours:
{MODEL_CHECKPOINT_DIR}/timm_classification/
is empty before start any train commands{MODEL_CHECKPOINT_DIR}/timm_classification/
, contains 6 sub-directories named
fully_reproduce_train_fold_2
fully_reproduce_train_fold_3
stage1_fully_reproduce_train_fold_0
stage1_fully_reproduce_train_fold_1
stage2_fully_reproduce_train_fold_0
stage2_fully_reproduce_train_fold_1
python3 src/tools/select_classification_best_ckpts.py --mode fully_reproduce
Behaviours:
{MODEL_FINAL_SELECTION_DIR}/
{MODEL_FINAL_SELECTION_DIR}/
:
{MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_0.pth.tar
{MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_1.pth.tar
{MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_2.pth.tar
{MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_3.pth.tar
PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/tools/convert_convnext_tensorrt.py --mode reproduce
Behaviours: Save a 4-folds combined TensorRT engine to {MODEL_FINAL_SELECTION_DIR}/best_ensemble_convnext_small_batch2_fp32.engine'
.
It takes 5-10 minutes for Kaggle's P100 GPU to finish, but take about 1 hour for A100 GPU (my case).
PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/submit/submit.py --mode partial_reproduce --trt
Behaviours:
{TEMP_DIR}/pngs/
and expected to be removed once inference done. {SUBMISSION_DIR}/submission.csv
python3 src/tools/prepair_roi_det_dataset.py --num-workers 4
Behaviors:
./assets/data/roi_det_yolov5_format/
to {PROCESSED_DATA_DIR}/roi_det_yolox/yolov5_format/
{PROCESSED_DATA_DIR}/roi_det_yolox/yolov5_format/images/
{PROCESSED_DATA_DIR}/roi_det_yolox/coco_format/
sh src/tools/train_and_convert_yolox_trt.sh
Behaviors:
{MODEL_CHECKPOINT_DIR}/yolox_roi_det/yolox_nano_416_reproduce/
{MODEL_CHECKPOINT_DIR}/yolox_roi_det/yolox_nano_416_reproduce/
{MODEL_FINAL_SELECTION_DIR}/yolox_nano_416_roi_torch.pth
{MODEL_FINAL_SELECTION_DIR}/yolox_nano_416_roi_trt.pth
This will use newly trained YOLOX in previous step as breast ROI extractor.
python3 src/tools/prepair_classification_dataset.py --num-workers 8
Behaviors:
stage1_images
in each raw dataset directory: {RAW_DATA_DIR}/{dataset_name}/stage1_images
for the intermediate stage.{PROCESSED_DATA_DIR}/classification/
contains 8-bits png images {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_images/
and cleaned label file {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_label.csv
for each dataset.python3 src/tools/cv_split.py
Behaviors: Create new directory and saving csv files in {PROCESSED_DATA_DIR}/rsna-breast-cancer-detection/cv/v2/
python3 src/tools/make_train_bash_script.py --mode fully_reproduce
This will save a file named _train_script_auto_generated.sh
in current directory, which include commands and instructions to train Convnext-small classification models.
To reproduce using single GPU, simply run
sh ./_train_script_auto_generated.sh
This could take 8 days to finish training (around 2 days for each fold).
Or if you have multiple GPUs and want to speed up training, simply follow instructions in the generated train script _train_script_auto_generated.sh
and run each command in parallel using different GPUs. For more details on the training process, take a look at my write up, part 4.3.Training
Behaviours:
{MODEL_CHECKPOINT_DIR}/timm_classification/
is empty before start any train commands{MODEL_CHECKPOINT_DIR}/timm_classification/
, contains 6 sub-directories named
fully_reproduce_train_fold_2
fully_reproduce_train_fold_3
stage1_fully_reproduce_train_fold_0
stage1_fully_reproduce_train_fold_1
stage2_fully_reproduce_train_fold_0
stage2_fully_reproduce_train_fold_1
python3 src/tools/select_classification_best_ckpts.py --mode fully_reproduce
Behaviours:
{MODEL_FINAL_SELECTION_DIR}/
{MODEL_FINAL_SELECTION_DIR}/
:
{MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_0.pth.tar
{MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_1.pth.tar
{MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_2.pth.tar
{MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_3.pth.tar
PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/tools/convert_convnext_tensorrt.py --mode reproduce
Behaviours: Save a 4-folds combined TensorRT engine to {MODEL_FINAL_SELECTION_DIR}/best_ensemble_convnext_small_batch2_fp32.engine'
.
It takes 5-10 minutes for Kaggle's P100 GPU to finish, but take about 1 hour for A100 GPU (my case).
PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/submit/submit.py --mode reproduce --trt
Behaviours:
{TEMP_DIR}/pngs/
and expected to be removed once inference done. {SUBMISSION_DIR}/submission.csv