batmanlab / Mammo-CLIP

Official Pytorch implementation of MICCAI 2024 paper (early accept, top 11%) Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography
Creative Commons Attribution 4.0 International
11 stars 1 forks source link
breast-cancer-prediction clip efficientnet mammogram multimodal rsna rsna-breast-cancer vindr vision-and-language

Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography

[Paper] [Hugging Face] [Pre-training Checkpoints] [VinDr png data]

Shantanu Ghosh1 , Clare B. Poynton2 , Shyam Visweswaran3, Kayhan Batmanghelich1
1BU ECE, 2 BUMC, 3 Pitt DBMI


After going through the instruction, it is recommended to visit this link for any further clarification on pretraining. If we hear more queries, we may add a separate FAQs in the future.

Table of Contents

  1. Environment Setup
  2. Data Download
  3. Pre-processing Images
  4. Data Preparation for Pretraining
  5. Data Preparation for Downstream Evaluation Tasks
  6. Mammo-CLIP checkpoints
  7. Pretraining Mammo-CLIP
  8. Creating classifiers and detectors
  9. Evaluation
  10. Additional Scripts
  11. Citation
  12. License and Copyright
  13. Contact

Environment Setup

Use environment.yml to setup the environment.

conda env create --name Mammo-CLIP -f environment.yml
conda activate Mammo-CLIP

Mammo-CLIP is implemented with following specification:

Data Download

Download the original versions VinDr and RSNA from the links for downstream evaluations:

For the PNG images converted from the original Dicom images, as mentioned in the preprocessing steps in the paper, refer to the following links:

To preprocess the dicom images directly, follow the instructions in the next section. If you downloaded the PNG images, skip the preprocessing steps.

Pre-processing images

Convert to png: RSNA

python ./src/preprocessing/ \
  --phase="test" \

convert to png: VinDr

python ./src/preprocessing/ \
  --phase="test" \

Data preparation for pretraining

Image-text dataset

  1. Our image-text dataset is an in-house dataset from UPMC. The sample csv: upmc_dicom_consolidated_final_folds_BIRADS_num_1_report.csv

  2. Note the HISTORY, FINDINGS, and IMPRESSION columns in the csv file. The FINDINGS and IMPRESSION columns are used to generate the text for the image. The HISTORY, FINDINGS and IMPRESSION columns contains templated text due to privacy.

  3. Next run the following command to augment the text with upmc_dicom_consolidated_final_folds_BIRADS_num_1_report.csv file:

# input: upmc_dicom_consolidated_final_folds_BIRADS_num_1_report.csv
# output: clip_pretrain_100.csv

python ./src/codebase/ \
  --dataset-path="/ocean/projects/asc170022p/shg121/PhD/Mammo-CLIP/src/codebase/data_csv" \
  --csv-path="upmc_dicom_consolidated_final_folds_BIRADS_num_1_report.csv" \
  1. The csv file of the final image-text dataset should have the following format:
index patient_id laterality image view CC MLO text text_augment
0 patient_id laterality ('R' or 'L') List of all image_paths for patient_id-laterality combo List of views for patient_id-laterality combo (only 'CC' and 'MLO' are used) List of image paths for CC view for patient_id-laterality combo List of image paths for MLO view for patient_id-laterality combo List of [findings, impression] List of [augmented findings, augmented impression]
  1. The final sample csv file as the output of step3 is here: clip_pretrain_100.csv

Image-label dataset

We use VinDr dataset as image-label dataset. So if you are planning to use it in the pre-training setup, use the following notebook to preprocess the VinDr dataset:


When you download the VinDr dataset, you will get these two csv files: breast-level_annotations.csv and finding_annotations.csv . We preprocess the finding_annotations.csv file to get vindr_detection_v1_folds.csv . VinDr.ipynb notebook requires vindr_detection_v1_folds.csv file as input and generate clip_vindr_final.csv file.

Data preparation for downstream evaluation tasks

Use the following csv files as metadata for the downstream tasks (classification, detection, zero-shot):

Dataset CSV
VinDr vindr_detection_v1_folds.csv
RSNA train_folds.csv

For detection/localization tasks, we have included the coordinates of the resized bounding boxes of VinDr in the above csv file. Somebody interested in resizing the bounding boxes by themselves, refer to this code.

Mammo-CLIP checkpoints

Following are the pre-training checkpoints of Mammo-CLIP: Model architecture Checkpoints (Google drive) Checkpoints (Hugging Face)
Best performance Efficient-Net B5 Efficient-Net B5
Lightweight Efficient-Net B2 Efficient-Net B2

We have also uploaded the downstream checkpoints for classification and localization (both linear probe and finetuning) with the image encoder of Efficient-Net B5 Mammo-CLIP for fold 0 here.


Look for /ocean/projects/asc170022p/shg121/PhD and replace it with your own path.

Pretraining Mammo-CLIP

python ./Mammo-CLIP/src/codebase/ --config-name pre_train_b5_clip.yaml

Creating classifiers and detectors


Zero-shot evaluation of Mammo-CLIP


python ./src/codebase/ \
  --config-name zs_clip.yaml$DIR model.clip_check_point=$FULL_CKPT

Adjust the CKPT and DIR variables according to your setup.

Linear probe vision encoder Mammo-CLIP on target classification task

python ./src/codebase/ \
  --data-dir '/ocean/projects/asc170022p/shg121/PhD/RSNA_Breast_Imaging/Dataset' \
  --img-dir 'External/Vindr/vindr-mammo-a-large-scale-benchmark-dataset-for-computer-aided-detection-and-diagnosis-in-full-field-digital-mammography-1.0.0/images_png' \
  --csv-file 'External/Vindr/vindr-mammo-a-large-scale-benchmark-dataset-for-computer-aided-detection-and-diagnosis-in-full-field-digital-mammography-1.0.0/vindr_detection_v1_folds.csv' \
  --clip_chk_pt_path "/ocean/projects/asc170022p/shg121/PhD/Mammo-CLIP/src/codebase/outputs/upmc_clip/b5_detector_period_n/checkpoints/fold_0/b5-model-best-epoch-7.tar" \
  --data_frac 1.0 \
  --dataset 'ViNDr' \
  --arch 'upmc_breast_clip_det_b5_period_n_lp' \
  --label "Mass" \
  --epochs 30 \
  --batch-size 8 \
  --num-workers 0 \
  --print-freq 10000 \
  --log-freq 500 \
  --running-interactive 'n' \
  --n_folds 1 \
  --lr 5.0e-5 \
  --weighted-BCE 'y' \
  --balanced-dataloader 'n' 

Finetune vision encoder Mammo-CLIP on target classification task

python ./src/codebase/ \
  --data-dir '/ocean/projects/asc170022p/shg121/PhD/RSNA_Breast_Imaging/Dataset' \
  --img-dir 'External/Vindr/vindr-mammo-a-large-scale-benchmark-dataset-for-computer-aided-detection-and-diagnosis-in-full-field-digital-mammography-1.0.0/images_png' \
  --csv-file 'External/Vindr/vindr-mammo-a-large-scale-benchmark-dataset-for-computer-aided-detection-and-diagnosis-in-full-field-digital-mammography-1.0.0/vindr_detection_v1_folds.csv' \
  --clip_chk_pt_path "/ocean/projects/asc170022p/shg121/PhD/Mammo-CLIP/src/codebase/outputs/upmc_clip/b5_detector_period_n/checkpoints/fold_0/b5-model-best-epoch-7.tar" \
  --data_frac 1.0 \
  --dataset 'ViNDr' \
  --arch 'upmc_breast_clip_det_b5_period_n_ft' \
  --label "Mass" \
  --epochs 30 \
  --batch-size 8 \
  --num-workers 0 \
  --print-freq 10000 \
  --log-freq 500 \
  --running-interactive 'n' \
  --n_folds 1 \
  --lr 5.0e-5 \
  --weighted-BCE 'y' \
  --balanced-dataloader 'n'

Linear probe vision encoder Mammo-CLIP on target detection task

python ./src/codebase/ \
  --data-dir '/ocean/projects/asc170022p/shg121/PhD/RSNA_Breast_Imaging/Dataset' \
  --img-dir 'External/Vindr/vindr-mammo-a-large-scale-benchmark-dataset-for-computer-aided-detection-and-diagnosis-in-full-field-digital-mammography-1.0.0/images_png' \
  --csv-file 'External/Vindr/vindr-mammo-a-large-scale-benchmark-dataset-for-computer-aided-detection-and-diagnosis-in-full-field-digital-mammography-1.0.0/vindr_detection_v1_folds.csv' \
  --clip_chk_pt_path "/ocean/projects/asc170022p/shg121/PhD/Mammo-CLIP/src/codebase/outputs/upmc_clip/b5_detector_period_n/checkpoints/fold_0/b5-model-best-epoch-7.tar" \
  --dataset 'ViNDr' \
  --arch 'clip_b5_upmc' \
  --epochs 120 \
  --batch-size 7 \
  --freeze_backbone "y" \
  --data_frac 1.0 \
  --concepts 'Mass' \
  --print-freq 5000 \
  --log-freq 300 \
  --running-interactive 'n' \
  --focal-alpha 0.25 \
  --focal-gamma 2.0 \
  --score-threshold 0.2

Finetune vision encoder Mammo-CLIP on target detection task

python ./src/codebase/ \
  --data-dir '/ocean/projects/asc170022p/shg121/PhD/RSNA_Breast_Imaging/Dataset' \
  --img-dir 'External/Vindr/vindr-mammo-a-large-scale-benchmark-dataset-for-computer-aided-detection-and-diagnosis-in-full-field-digital-mammography-1.0.0/images_png' \
  --csv-file 'External/Vindr/vindr-mammo-a-large-scale-benchmark-dataset-for-computer-aided-detection-and-diagnosis-in-full-field-digital-mammography-1.0.0/vindr_detection_v1_folds.csv' \
  --clip_chk_pt_path "/ocean/projects/asc170022p/shg121/PhD/Mammo-CLIP/src/codebase/outputs/upmc_clip/b5_detector_period_n/checkpoints/fold_0/b5-model-best-epoch-7.tar" \
  --dataset 'ViNDr' \
  --arch 'clip_b5_upmc' \
  --epochs 120 \
  --batch-size 7 \
  --freeze_backbone "n" \
  --data_frac 1.0 \
  --concepts 'Mass' \
  --print-freq 5000 \
  --log-freq 300 \
  --running-interactive 'n' \
  --focal-alpha 0.25 \
  --focal-gamma 2.0 \
  --score-threshold 0.2

Additional scripts

For all the training scripts, we add them in the scripts directory:

Scripts Purpose Pretrain Mammo-CLIP b5 Pretrain Mammo-CLIP b2 Evaluate Mammo-CLIP b5 on fine tuning tasks for classification Evaluate Mammo-CLIP b2 on fine tuning tasks for classification Evaluate Mammo-CLIP b5 on linear probing tasks for classification Evaluate Mammo-CLIP b2 on linear probing tasks for classification Evaluate Mammo-CLIP b5 on fine tuning tasks for detection Evaluate Mammo-CLIP b2 on fine tuning tasks for detection Evaluate Mammo-CLIP b5 on linear probing tasks for detection Evaluate Mammo-CLIP b2 on linear probing tasks for detection


  title={Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography},
  author={Ghosh, Shantanu and Poynton, Clare B and Visweswaran, Shyam and Batmanghelich, Kayhan},
  journal={arXiv preprint arXiv:2405.12255},

License and copyright

Licensed under the Creative Commons Attribution 4.0 International

Copyright © Batman Lab, 2024


For any queries, contact: