Adamouization / Breast-Cancer-Detection-Mammogram-Deep-Learning

Master's dissertation for breast cancer detection in mammograms using deep learning techniques in Tensorflow. Contains the final report and source code.
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0280841
BSD 2-Clause "Simplified" License
97 stars 26 forks source link
breast-cancer-detection cbis-ddsm-dataset convolutional-neural-networks deep-learning mammogram mini-mias-dataset transfer-learning

Breast Cancer Detection in Mammograms Using Deep Learning Techniques DOI GitHub license Python Jupyter Notebook

Table of Contents

Publication Updates

I have been working since the end of my Master's in 2020 to publish this dissertation in journal. As of May 2023, this research is published in PLOS ONE, and can be read and cited here: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0280841. Here are the latest updates of this project:

2020

2021

2022

2023

What can I find in this repository?

You can find the full dissertation project (code + report) for the MSc Artificial Intelligence at the University of St Andrews (2020).

The publication of this project can be found here:

The original dissertation report can be read here: Breast Cancer Detection in Mammograms using Deep Learning Techniques, Adam Jaamour (2020)

If you have any questions, issues or message, please either:

Abstract

The objective of this dissertation is to explore various deep learning techniques that can be used to implement a system which learns how to detect instances of breast cancer in mammograms. Nowadays, breast cancer claims 11,400 lives on average every year in the UK, making it one of the deadliest diseases. Mammography is the gold standard for detecting early signs of breast cancer, which can help cure the disease during its early stages. However, incorrect mammography diagnoses are common and may harm patients through unnecessary treatments and operations (or a lack of treatments). Therefore, systems that can learn to detect breast cancer on their own could help reduce the number of incorrect interpretations and missed cases.

Convolution Neural Networks (CNNs) are used as part of a deep learning pipeline initially developed in a group and further extended individually. A bag-of-tricks approach is followed to analyse the effects on performance and efficiency using diverse deep learning techniques such as different architectures (VGG19, ResNet50, InceptionV3, DenseNet121, MobileNetV2), class weights, input sizes, amounts of transfer learning, and types of mammograms.

CNN Model

Ultimately, 67.08\% accuracy is achieved on the CBIS-DDSM dataset by transfer learning pre-trained ImagetNet weights to a MobileNetV2 architecture and pre-trained weights from a binary version of the mini-MIAS dataset to the fully connected layers of the model. Furthermore, using class weights to fight the problem of imbalanced datasets and splitting CBIS-DDSM samples between masses and calcifications also increases the overall accuracy. Other techniques tested such as data augmentation and larger image sizes do not yield increased accuracies, while the mini-MIAS dataset proves to be too small for any meaningful results using deep learning techniques. These results are compared with other papers using the CBIS-DDSM and mini-MIAS datasets, and with the baseline set during the implementation of a deep learning pipeline developed as a group.

Usage on a GPU lab machine

Clone the repository:

cd ~/Projects
git clone https://github.com/Adamouization/Breast-Cancer-Detection-Code

Create a repository that will be used to install Tensorflow 2 with CUDA 10 for Python and activate the virtual environment for GPU usage:

cd libraries/tf2
tar xvzf tensorflow2-cuda-10-1-e5bd53b3b5e6.tar.gz
sh build.sh

Activate the virtual environment:

source /cs/scratch/<username>/tf2/venv/bin/activate

Create outputand save_models directories to store the results:

mkdir output
mkdir saved_models

cd into the src directory and run the code:

main.py [-h] -d DATASET [-mt MAMMOGRAMTYPE] -m MODEL [-r RUNMODE] [-lr LEARNING_RATE] [-b BATCHSIZE] [-e1 MAX_EPOCH_FROZEN] [-e2 MAX_EPOCH_UNFROZEN] [-roi] [-v] [-n NAME]

where:

Dataset installation

mini-MIAS dataset

cd data/mini-MIAS/
mkdir images_original
mkdir images_processed
cd images_original
wget http://peipa.essex.ac.uk/pix/mias/all-mias.tar.gz
tar xvzf all-mias.tar.gz
rm -rf *.txt 
rm -rf README 
cd ../images_processed
mkdir benign_cases
mkdir malignant_cases
mkdir normal_cases
python3 ../../../src/dataset_processing_scripts/mini-MIAS-initial-pre-processing.py

DDSM and CBIS-DDSM datasets

These datasets are very large (exceeding 160GB) and more complex than the mini-MIAS dataset to use. They were downloaded by the University of St Andrews School of Computer Science computing officers onto \textit{BigTMP}, a 15TB filesystem that is mounted on the Centos 7 computer lab clients with NVIDIA GPUsusually used for storing large working data sets. Therefore, the download process of these datasets will not be covered in these instructions.\

The generated CSV files to use these datasets can be found in the /data/CBIS-DDSM directory, but the mammograms will have to be downloaded separately. The DDSM dataset can be downloaded here, while the CBIS-DDSM dataset can be downloaded here.

Citation

Published article citation

@article{10.1371/journal.pone.0280841,
    doi = {10.1371/journal.pone.0280841},
    author = {Jaamour, Adam AND Myles, Craig AND Patel, Ashay AND Chen, Shuen-Jen AND McMillan, Lewis AND Harris-Birtill, David},
    journal = {PLOS ONE},
    publisher = {Public Library of Science},
    title = {A divide and conquer approach to maximise deep learning mammography classification accuracies},
    year = {2023},
    month = {05},
    volume = {18},
    url = {https://doi.org/10.1371/journal.pone.0280841},
    pages = {1-24},
    number = {5},
}

Code citation

@software{adam_jaamour_2020_3985051,
  author       = {Adam Jaamour and
                  Ashay Patel and
                  Shuen-Jen Chen},
  title        = {{Breast Cancer Detection in Mammograms using Deep 
                   Learning Techniques: Source Code}},
  month        = aug,
  year         = 2020,
  publisher    = {Zenodo},
  version      = {v1.0},
  doi          = {10.5281/zenodo.3985051},
  url          = {https://doi.org/10.5281/zenodo.3985051}
}

License

Code Authors

The common pipeline can be found at DOI 10.5281/zenodo.3975092

Star History

Star History Chart

Contact