Closed GemmaTuron closed 2 years ago
Thanks Gemma.
Sorry for the lateness.
Skin Reaction Dataset overview: I'm working on Skin Reaction Dataset. Exposure to chemical agents can induce an immune reaction in susceptible individuals that lead to skin sensitization. Given the smile drug, can we predict whether it can cause a skin reaction 1 or 0. The Dataset contains 404 drugs.
Importing Dataset: I have successfully installed TDC package and imported the Skin Reaction Dataset from Toxicity Single instance prediction Datasets from the TDC package
Splitting Datasets: Successfully Split the model into three datasets.
Data Visualization Used matplotlib to visualize the amount of actives(1) and inactives(0) we have in our Dataset. As the image shows this is clearly a binary classification problem
Using RDKIT we can Visualize the moleculatr structure of our Smiles . Succesfuly imported and drawn an active and inactive molecule respectfully.
@GemmaTuron can you review what i have done. Here is my Colab
Hi @alaminumar !
Good start, but can you provide an explanation of the model performances?
Okay Gemma. First let me explain how we have gotten our models.
Model Training: We train our model when we take Smile Drug as input(X) in our model and pass Y as it's output which is its predicted bioactivity. We use Lazy-QSAR model and MorganBinaryClassifier for our training, thus don't need to convert smiles into signatures as it is done automatically.
Evaluate Model: In order to Evaluate our model, we use the following.
To answer your question Gemma . My model performance for my first iteration was average to poor. So, I decided to double the time we trained the model to 3600 seconds . My first iteration had an AUROC value of 0.61128 and 0.7708 for the validation and test models respectively . As we can see its not that good . Here are the corresponding graphs and data for the second iteration.
Validation Precision 0.7368421052631579 Recall 0.9655172413793104
Contingency Matrix as we can see from the confusion matrix we have 38/41 accurately predicted . This is very good
ROC Curve AUROC Value 0.6332288401253919
Test Precision of a Test Set: 0.6125 Recall of a Test Set: 1.0
AUROC 0.7822066326530612
Sorry for the lateness @GemmaTuron . I had to deal with an Emergency.
Updated colab Colab
Hi @alaminumar
I hope everything is solved, good job on the modelling. I'll mark this as completed and you can move onto finalising your outreachy application!
Model Title
Skin Reaction (TDC dataset)
Publication
Hello @alaminumar!
As part of your Outreachy contribution, we have assigned you the dataset "Skin Reaction" from the Therapeutics Data Commons to try and build a binary classification ML model. Please copy the provided Google Colab template and use this issue to provide updates on the progress. We'll value not only being able to build the model but also interpreting its results.
Code
No response