UCLA_BIOENG_596_FALL_21/Winter 2022/Spring2022

Ultrasound Breast Cancer Detection

Keane Gonzalez

Code and script for Ultrasound Cancer tumor detection using Faster R-CNN with a ResNet50 pre-trained backbone

-- Pre-Processing and Results scripts are also in this area.

Data:

BUSI ultrasound images for first fine-tuning (~700 images) https://scholar.cu.edu.eg/?q=afahmy/pages/dataset
UCLA Ultrasound set for final classification & Region detection (~8800 images)

Environment: Jupyter Notebook in Google Colab (Pro for the extended training time and memory)

Code Flow:

Pre-Process
Data Ingestion
Load Faster-RCNN (Resnet50) weights
Train on BUSI
Load BUSI saved checkpoint with highest IOU results
Train with UCLA data
Save model with highest IOU value (after penalizing for false positives/negatives)
Results plots and verification
TBD

Output Example Images with Predictions and Truth Regions:

Currently being retrained/retested without ImageNet normalization

Steps for usage

The main options page starts at the first cell. It will import needed libraries for colab and the data directories can be changed here. The code options for choosing training or testing are set here too.

This is the main set of options in cell 1. There are two main modes: training and testing. Set run_testing_only to 1 to load the necessary functions and stop right before the training starts. This allows the user to run the code segment that loads particular saved models for testing. Setting run_testing_only to 0 will cause the code to go through the RESNET training sequence

                    ################################################################################
                    #                    MODEL SPECIFIC OPTIONS
                    #
                    # Choose training sets to use
                    training_set = 0 #0=UCLA US, 1 = UCLA + BUSI, 2 = BUSI

                    #choose the normalization path to use
                    #0 is the 0 to 1 Normalization based upon the ultrasound data
                    #1 is the 0 to 1 Normalization based upon the ImageNet data
                    #
                    #
                    model_branch = 1 #0=0to1, 1 = -2to2
                    transfer_train = 0 #we will load the model states from a checkpoint epoch instead

                    #!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                    #               TURN TRAINING ON OR OFF
                    # Choose to do training or to load models for use in testing only
                    # run_testing_only = 0 is for training, =1 for a stop after loading models. 1 is
                    # used for running a model on the Test dataset
                    run_testing_only = 1

                    # Set use_last_epoch to 1 to start from a pre-trained checkpoint. Setting to 0
                    #starts training from the beginning
                    use_last_epoch = 1 #
                    ################################################################################

TBD
TESTING Once the Training has completed or if staring from a pre-generated model, set run_testin_only to 1. This will kick you out of the main training cell and allow you to load the model separately from all the previous data. The function tab below is where the user can choose which model to load:

Load Saved model dictionary for testing

These are the most important sections needed to load in the model. Data selected for testing is very important as it must have been regenerated with this model or saved previously.

The checkpoint method is reused to load the model, with state dict being necessary when saved. Epoch is not necessary, but was automatically included from the training saves. For the UCLA runs, the same training/validation/test data was reused throughout the whole training sequences to reduce any chance of data being mixed between them. For normal runs, data info was saved along with any other info needed for testing.

            checkpoint = torch.load(model_dict)
            modelr.load_state_dict(checkpoint['model_state_dict'])
            #optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
            start_epoch = checkpoint['epoch']

            #
            # Load relevant saved train/validation/test file sets
            #
            print('Loading previous set of train/val/test files')
            #subdir = '0TO1_UCLA_BEST'
            last_data_list = os.path.join(model_dir, subdir,'last_data_set.pickle') #0TO1
            #'/content/gdrive/My Drive/BreastUS/MODEL_SAVE/UCLA_FINAL/last_data_set.pickle'

            archived_data = pickle.load( open( last_data_list, "rb" ) )
            training_data = archived_data[0][0]
            validation_data = archived_data[0][1]
            test_data =archived_data[0][2]
            bounding_box = archived_data[0][3]
            first50 =archived_data[0][4]

Test the model with available data

To test out the model, jump to the section below:

FUNCTION: TEST UCLA FINAL MODELS

This function will take in a cutoff and data for the model. The cell below, Run UCLA METRICS, will call this function for cutoffs from 0 - 90% confidence scores.

            for loop in range(0,10,1):
                score_cutoff =  float(loop/10) #to get floating point step sizes
                test_metrics = test_ucla_classification(modelr,input_dataset=test_data, 
                                                        batch_size=batch_size, 
                                                        score_cutoff=score_cutoff, 
                                                        iou_cutoff = iou_cutoff)
                saved_metrics[loop] = test_metrics

The output will be saved for use in plotting in the next cell. All of the useful metrics (TP/FP/TN/FN, etc...) will be saved for each confidence cutoff

Plot Metrics

A generic set of plots produced by this cell, taken from the confidence score calculations saved as pickle files:

NOTES:

The Ultrasound system (Phillips) burns the workspace information onto the images (or the US image is embedded into the workspace and saved). Cropping removes the breast map, but some of the scale markers (right side slider-type) and orientation text may remain in the final image.
BUSI training can complete within a normal Google Colab sequence (8-12 hours). Fine-tuning with the larger UCLA dataset requires saving checkpoints and restarting. The initial restarts have loss that bounces at the start each time, but quickly get back to the loss/IOU of the previous checkpoint (might still be some tweaks/fixes to be made to loading from checkpoints).
GradCam on Colab has some issues with the version Colab seems to support (or I'm not using it correctly). Attaching hooks and manually getting heatmaps is in there as debug code,but a prepared library would be better (many of them didn't work out of the box with Faster-RCNN architectures).
Final model is saved based upon IOU criteria instead of classification validation criteria (best IOU gets saved off). Checking validation against class metrics might produce increased classifier results.
US images that are mostly black or very low intensity may throw off the simple cropping (looks for distinct window edge thresholds to differentiate ultrasound embedded image and GUI stuff.

References:

Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A. Dataset of breast ultrasound images. Data in Brief. 2020 Feb;28:104863. DOI: 10.1016/j.dib.2019.104863.

einsteinxx / Ultrasound_Breast_Cancer_Detection_UCLA_BIOENG_596_FALL_21

readme

UCLA_BIOENG_596_FALL_21/Winter 2022/Spring2022

Ultrasound Breast Cancer Detection

Load Saved model dictionary for testing

FUNCTION: TEST UCLA FINAL MODELS