@Galvanize Data Science Immersive Program
The purpose of this project is to create a tool that considering the image of a mole, can calculate the probability that a mole can be malign.
Skin cancer is a common disease that affect a big amount of peoples. Some facts about skin cancer:
The idea of this project is to construct a CNN model that can predict the probability that a specific mole can be malign.
To train this model the data to use is a set of images from the International Skin Imaging Collaboration: Mellanoma Project ISIC https://isic-archive.com.
The specific datasets to use are:
ISIC_UDA-2_1: Moles and melanomas. Biopsy-confirmed melanocytic lesions. Both malignant and benign lesions are included.
ISIC_UDA-1_1 Moles and melanomas. Biopsy-confirmed melanocytic lesions. Both malignant and benign lesions are included.
ISIC_MSK-2_1: Benign and malignant skin lesions. Biopsy-confirmed melanocytic and non-melanocytic lesions.
ISIC_MSK-1_2: Both malignant and benign melanocytic and non-melanocytic lesions. Almost all images confirmed by histopathology. Images not taken with modern digital cameras.
ISIC_MSK-1_1: Moles and melanomas. Biopsy-confirmed melanocytic lesions, both malignant and benign.
As summary the total images to use are:
Benign Images | Malignant Images |
---|---|
1208 | 849 |
Some sample images are shown below:
Sample images of benign moles:
Sample images of malign moles:
The following preprocessing tasks are developed for each image:
The idea is to develop a simple CNN model from scratch, and evaluate the performance to set a baseline. The following steps to improve the model are:
To evaluate the different models we will use ROC Curves and AUC score. To choose the correct model we will evaluate the precision and accuracy to set the threshold level that represent a good tradeoff between TPR and FPR.
As mention before the idea is to generate a tool to predict the probability of a malign mole. To do it, I'm planning to provide the following resources:
1. Web App: The web app will have the possibility that a user upload a high quality image of an specific mole. The results will be a prediction about the probability that the given mole be malign in terms of percentage. The backend that contain the web app and model loaded will be located in Amazon Web Services.
2. Iphone App: Our CNN model will be loaded into the iPhone to make local predictions. Advantages: The image data don't need to be uploaded to any server, because the model predictions can be done through the pre-trained model loaded into the iPhone.
3. Android App: (Optional if time allow it)
Activity | Days | Status | Prog |
---|---|---|---|
1. Data Acquisition | 1 | Done | ++++ |
2. Initial Preprocessing and visualizations | 1 | Done | ++++ |
3. First Model Construction and tuning | 2 | Done | ++++ |
4. Model Optimization I (Data augmentation) | 1 | Done | ++++ |
5. Model Optimization II (Transferred learning) | 2 | Done | ++++ |
6. Model Optimization III (Fine Tuning) | 2 | Done | ++++ |
7. Web App Development + Backend Service | 2 | Done | ++++ |
8. Ios App Development | 2 | Done | ++++ |
9. Android App Development | 2 | Pending | ---- |
10. Presentation preparation | 1 | Done | ++++ |
Simple Convolutional Neural Network with 3 layers. The results obtained until now can be shown on the ROC curve presented below:
class | precision | recall | f1-score | support |
---|---|---|---|---|
0.0 | 0.86 | 0.88 | 0.87 | 50 |
1.0 | 0.88 | 0.86 | 0.87 | 50 |
avg / total | 0.87 | 0.87 | 0.87 | 100 |
class | precision | recall | f1-score | support |
---|---|---|---|---|
0.0 | 0.87 | 0.92 | 0.89 | 50 |
1.0 | 0.91 | 0.86 | 0.89 | 50 |
avg / total | 0.89 | 0.89 | 0.89 | 100 |
class | precision | recall | f1-score | support |
---|---|---|---|---|
0.0 | 0.82 | 0.94 | 0.88 | 50 |
1.0 | 0.93 | 0.80 | 0.86 | 50 |
avg / total | 0.88 | 0.87 | 0.87 | 100 |
class | precision | recall | f1-score | support |
---|---|---|---|---|
0.0 | 0.81 | 0.96 | 0.88 | 50 |
1.0 | 0.95 | 0.78 | 0.86 | 50 |
avg / total | 0.88 | 0.87 | 0.87 | 100 |
class | precision | recall | f1-score | support |
---|---|---|---|---|
0.0 | 0.88 | 0.88 | 0.88 | 50 |
1.0 | 0.88 | 0.88 | 0.88 | 50 |
avg / total | 0.88 | 0.88 | 0.88 | 100 |
All the layers have a Relu activation function, except the last one that is sigmoid, to obtain the probability of a Malignant mole.
As part of this project I have developed an iOS app using the coreML libraries released by apple. The advantage to use this libraries is that the model and the image are stored locally on the phone, and internet connection is not needed. The keras model trained before is converted into coreML model and loaded into the phone to make the predictions. Below is a picture of the app and two examples of results.
Example of low risk mole result:
Example of High risk mole result:
In order to kae in consideration the user of different platforms, I also create a web App that can be accessed on: http://skinmolesrisk.ddns.net:7000 This app is responsive so can be used directly from any mobile phone or web browser.
This tool has been designed only for educational purposes to demonstrate the use of Machine Learning tools in the medical field. This tool does not replace advice or evaluation by a medical professional. Nothing on this site should be construed as an attempt to offer a medical opinion or practice medicine.