Closed HellenNamulinda closed 1 year ago
I'm using Ubuntu 22.04.2 and I was able to successfully install and run Ersilia from the CLI.
Hello, I am writing to express my interest in contributing to the Extension of the Ersilia Model Hub project during the Outreachy internship program for summer 2023. As an African who grew up in a low-income family in Uganda, I have seen firsthand the devastating impact that diseases can have on communities that lack access to healthcare and medical resources.
Africa has always faced significant challenges when it comes to diseases, with people dying from treatable illnesses. However, I am optimistic about the potential of technology, specifically AI/ML, to help address these challenges and improve health outcomes for people across the continent and across the world. I am excited about the prospect of working with the Ersilia team on this project because I believe that it has the potential to make a significant impact on the lives of people in my community and around the world through the mission to equip laboratories in low and middle-income countries with state-of-the-art AI/ML tools for infectious and neglected disease research.
The Ersilia Model Hub project presents a unique opportunity to make a meaningful impact on global health by accelerating the drug development process and improving the success rate of clinical trials.
With a degree in Computer Science, I have extensive programming knowledge using Python and JavaScript and I have also gained experience research and in developing machine learning models using deep learning frameworks such as TensorFlow and Keras. I believe that I am well-suited to contribute to this project.
During my time at Ersilia, I plan to contribute effectively by reviewing, identifying, testing and incorporating models into the Hub and by developing new AI/ML models for biomedicine. Contributing to this project will not only help me improve my technical skills but also provide me with valuable insights into effective teamwork and project management.
After the internship, I plan to continue using technology and my skills to help address societal challenges. I am confident that the experience gained during the Outreachy program will help me achieve my career goals and make a positive impact on the world.
Thank you!
Hi @HellenNamulinda Welcome to the contribution period!
Hello @GemmaTuron, @miquelduranfrigola, @leoank and @DhanshreeA, The model I selected is STOUT(SMILES-TO-IUPAC-name translator): https://github.com/Kohulan/Smiles-TO-iUpac-Translator
STOUT is a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. IUPAC names back to a valid SMILES string.
Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. The International Union of Pure and Applied Chemistry (IUPAC) established universally accepted naming scheme for chemistry based on a set of rules. Correct chemical name assignment is still difficult for humans to perform because of the complexity of this ruleset, and there are only a few rule-based cheminformatics toolkits that can help with this process automatically.
STOUT was developed based on language translation and language understanding. The two chemical representations were treated as two different languages – each SMILES string and corresponding IUPAC name was treated as two different sentences that have the same meaning in reality.
Hello @GemmaTuron, @miquelduranfrigola, @leoank and @DhanshreeA, I use a Linux machine with Ubuntu 22.04. To install STOUT, I created a conda environment with Python version 3.8. I then installed STOUT-pypi together with the dependencies in the environment I had created. The packages of my STOUT environment are listed below;
Package Version
---------------------------- ---------
absl-py 1.4.0
astunparse 1.6.3
cachetools 5.3.0
certifi 2022.12.7
charset-normalizer 3.1.0
click 8.1.3
flatbuffers 23.3.3
gast 0.4.0
google-auth 2.16.2
google-auth-oauthlib 0.4.6
google-pasta 0.2.0
grpcio 1.51.3
h5py 3.8.0
idna 3.4
importlib-metadata 6.0.0
JPype1 1.4.1
keras 2.10.0
Keras-Preprocessing 1.1.2
libclang 15.0.6.1
Markdown 3.4.1
MarkupSafe 2.1.2
numpy 1.24.2
oauthlib 3.2.2
opt-einsum 3.3.0
packaging 23.0
pip 23.0.1
protobuf 3.19.6
pyasn1 0.4.8
pyasn1-modules 0.2.8
pystow 0.5.0
requests 2.28.2
requests-oauthlib 1.3.1
rsa 4.9
setuptools 65.6.3
six 1.16.0
STOUT-pypi 2.0.5
tensorboard 2.10.1
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.1
tensorflow 2.10.1
tensorflow-estimator 2.10.0
tensorflow-io-gcs-filesystem 0.31.0
termcolor 2.2.0
tqdm 4.65.0
typing_extensions 4.5.0
unicodedata2 15.0.0
urllib3 1.26.15
Werkzeug 2.2.3
wheel 0.38.4
wrapt 1.15.0
zipp 3.15.0
My installation was successful. I verified it using a simple STOUT example
translate-forward
IUPAC name of CN1C=NC2=C1C(=O)N(C(=O)N2C)C is: 1,3,7-trimethylpurine-2,6-dione
And translate_reverse
SMILES of 1,3,7-trimethylpurine-2,6-dione is: CN1C=NC2=C1C(=O)N(C)C(=O)N2C
N.B
While running the model, i got error related to JVM DLL not found
. However, installing java and the jdk using the command sudo apt-get install default-jre default-jdk
fixed the problem for me.
Hello @HellenNamulinda ! Wonderful, can you try it our from the Ersilia Model Hub and check it also works?
Thanks
Hello @HellenNamulinda ! Wonderful, can you try it our from the Ersilia Model Hub and check it also works?
Thanks
Respected @GemmaTuron, Apologies for the delay, I have been having power issues. But I'm on it today and I will be sharing the next updates soon.
I sampled a few examples(10) from the Essential Medicines List and ran predictions on them using the installed model.
I used a notebook and pandas made it easy for me to slice the data, getting lists for both IUPAC names(drugs) and SMILES.
I got translations for both SMILES to IUPAC names using translate-forward
and back/ reverse translations for IUPAC names to SMILES using translate_reverse
SMILES: CC(O)=O
Predicted IUPAC name: aceticacid
Correct IUPAC name: acetic acid
SMILES: CC(=O)N[C@@H](CS)C(O)=O
Predicted IUPAC name: (2R)-2-acetamido-3-sulfanylpropanoicacid
Correct IUPAC name: acetylcysteine
SMILES: CC(=O)Oc1ccccc1C(O)=O
Predicted IUPAC name: 2-acetyloxybenzoicacid
Correct IUPAC name: acetylsalicylic acid
SMILES: NC1=NC(=O)c2ncn(COCCO)c2N1
Predicted IUPAC name: 2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one
Correct IUPAC name: aciclovir
SMILES: OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1
Predicted IUPAC name: [(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]2-hydroxy-2,2-dithiophen-2-ylacetate
Correct IUPAC name: aclidinium
SMILES: CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1
Predicted IUPAC name: (E)-N-[4-(3-chloro-4-fluoroanilino)-7-[(3S)-oxolan-3-yl]oxyquinazolin-6-yl]-4-(dimethylamino)but-2-enamide
Correct IUPAC name: afatinib
SMILES: CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
Predicted IUPAC name: methylN-(6-propylsulfanyl-1H-benzimidazol-2-yl)carbamate
Correct IUPAC name: albendazole
SMILES: O=C1N=CN=C2NNC=C12
Predicted IUPAC name: 1,2-dihydropyrazolo[3,4-d]pyrimidin-4-one
Correct IUPAC name: allopurinol
SMILES: CC(=O)Nc1c(I)c(NC(C)=O)c(I)c(C(O)=O)c1I
Predicted IUPAC name: 3,5-diacetamido-2,4,6-triiodobenzoicacid
Correct IUPAC name: amidotrizoate
IUPAC name: acetic acid
Predicted SMILES: CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC
Correct SMILES: CC(O)=O
IUPAC name: acetylcysteine
Predicted SMILES: CC(=O)[Sb]=CC=C1
Correct SMILES: CC(=O)N[C@@H](CS)C(O)=O
IUPAC name: acetylsalicylic acid
Predicted SMILES: CC(=O)[Se](=O)(=O)O
Correct SMILES: CC(=O)Oc1ccccc1C(O)=O
IUPAC name: aciclovir
Predicted SMILES: [Cu]
Correct SMILES: NC1=NC(=O)c2ncn(COCCO)c2N1
IUPAC name: aclidinium
Predicted SMILES: [Ac]
Correct SMILES: OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1
IUPAC name: afatinib
Predicted SMILES: [At]
Correct SMILES: CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1
IUPAC name: albendazole
Predicted SMILES: C1=CC2=CN=NC2=CC=C1
Correct SMILES: CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
IUPAC name: allopurinol
Predicted SMILES: C1=C2C(=NC=N2)N=CN1O
Correct SMILES: O=C1N=CN=C2NNC=C12
IUPAC name: amidotrizoate
Predicted SMILES: [N-]=[N+]=NON
Correct SMILES: CC(=O)Nc1c(I)c(NC(C)=O)c(I)c(C(O)=O)c1I
From the translations, the model still struggles to accurately translate SMILES to IUPAC names on the EML dataset.
For the same examples sampled from the Essential Medicines List , I ran predictions on the version of the same model available on Ersilia Model Hub using Ersilia CLI.
I fetched the model in my ersilia conda envt using the command ersilia fetch eos4se9
.
I served the model using the command ersilia serve smiles2iupac
And for each of the examples, I ran ersilia api -i <SMILES>
{
"input": {
"key": "QTBSBXVTEAMEQO-UHFFFAOYSA-N",
"input": "CC(O)=O",
"text": "CC(O)=O"
},
"output": {
"outcome": [
"aceticacid"
]
}
}
{
"input": {
"key": "PWKSKIMOESPYIA-BYPYZUCNSA-N",
"input": "CC(=O)N[C@@H](CS)C(O)=O",
"text": "CC(=O)N[C@@H](CS)C(O)=O"
},
"output": {
"outcome": [
"(2R)-2-acetamido-3-sulfanylpropanoicacid"
]
}
}
{
"input": {
"key": "BSYNRYMUTXBXSQ-UHFFFAOYSA-N",
"input": "CC(=O)Oc1ccccc1C(O)=O",
"text": "CC(=O)Oc1ccccc1C(O)=O"
},
"output": {
"outcome": [
"2-acetyloxybenzoicacid"
]
}
}
{
"input": {
"key": "MKUXAQIIEYXACX-UHFFFAOYSA-N",
"input": "NC1=NC(=O)c2ncn(COCCO)c2N1",
"text": "NC1=NC(=O)c2ncn(COCCO)c2N1"
},
"output": {
"outcome": [
"2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one"
]
}
}
Based on these examples, the model on Ersilia Model Hub happens perform like the original model.
Hi @HellenNamulinda !
Thanks for this good work! Can you have a look at Adedeji's issue, who also run the same model, and let us know which version of the STOUT are you using? And then move forward with week 3!
Thanks
Hi @HellenNamulinda !
Thanks for this good work! Can you have a look at Adedeji's issue, who also run the same model, and let us know which version of the STOUT are you using? And then move forward with week 3!
Thanks
Greetings @GemmaTuron , I appreciate the compliments. I am using STOUT version 2.0.5
>>> STOUT.__version__
'2.0.5'
Let me look at Adedeji's issue and see how I can contribute.
hey @HellenNamulinda ! I have successfully installed the model, Can you tell me how you ran those examples?
hey @HellenNamulinda ! I have successfully installed the model, Can you tell me how you ran those examples?
Hello @Med16-11, First download the Essential Medicines List . Just right-click and choose save as. You will be able to save .csv file.
Since the examples are so many, you might just sample a few of the them. You can use a Jupyter notebook. This code can help;
#importing libraries
import pandas as pd
from STOUT import translate_forward, translate_reverse
# Dataset
eml_data = pd.read_csv('./eml_canonical.csv')
eml_data.head()
#Part of the data
test_data_smiles = eml_data['smiles'].values.tolist()[3:13]
print('SMILES: {}'.format(test_data_smiles))
test_data_drugs = eml_data['drugs'].values.tolist()[3:13]
print('IUPAC names: {}'.format(test_data_drugs))
#Forward translation
# SMILES to IUPAC name translation
for x in range(len(test_data_smiles)):
SMILES = test_data_smiles[x]
IUPAC_name = translate_forward(SMILES)
#print("IUPAC name of "+SMILES+" is: "+IUPAC_name)
print("SMILES: {} \n".format(SMILES), "Predicted IUPAC name: {} \n".format(IUPAC_name), "Correct IUPAC name: {} \n".format(test_data_drugs[x]))
#Back translation
# IUPAC name to SMILES translation
for x in range(len(test_data_drugs)):
IUPAC_name= test_data_drugs[x]
SMILES = translate_reverse(IUPAC_name)
print("IUPAC name: {} \n".format(IUPAC_name), "Predicted SMILES: {} \n".format(SMILES), "Correct SMILES: {} \n".format(test_data_smiles[x]))
Let me know if this helps
Thanks for the quick response @HellenNamulinda I am actually working on ubuntu an tried running this file by saving it as .py But its giving lots of error. Here are the ss, I have tried running it both ways
Thanks for the quick response @HellenNamulinda I am actually working on ubuntu an tried running this file by saving it as .py But its giving lots of error. Here are the ss, I have tried running it both ways
- when I was in base directory
- conda STOUT env
Hi @Med16-11, this error shows that your environment doesn't have STOUT installed.
Run this command pip install STOUT-pypi
to install it in the virtual environment
Hi @Med16-11
Indeed it does seem you have not installed stout in the right conda environment, or when you did there was an error. Can you try again and let us know? Let's use your specific issue for getting this right.
@HellenNamulinda I think you are ready to go onto week 3 tasks! Thanks
sure
Model Name HydrAMP: a deep generative model for antimicrobial peptide discovery
Model Description HydrAMP is a model for generation of novel peptide sequences satisfying given antimicrobial activity conditions. It was trained as a conditional variational autoencoder that captures the antimicrobial properties of peptides by learning their lower-dimensional, continuous space. The model disentangles the peptide's representation from its antimicrobial conditions. HydrAMP can generate diverse and potent peptides, which is a step towards resolving the antimicrobial resistance crisis. Slug antimicrobial-peptides Tag antimicrobial-resistance, drug-discovery, antimicrobial-peptides Publication https://doi.org/10.1101/2022.01.27.478054 Source code https://github.com/szczurek-lab/hydramp checkpoints of HydrAMP are provided as well the notebooks for retraining. License MIT license
This model is important as it shows progress towards a new generation of antibiotics.
Hi @HellenNamulinda !
I really like this model, good catch! The only current limitation is that the Ersilia Model Hub is not yet ready to accept peptides as input (instead of small molecules) - but this is in the pipeline to implement, after text inputs Can I ask you to fill in the model suggestion form so we have that model in our list of to does? Looking forward to your next model suggestions!
Hi @HellenNamulinda !
I really like this model, good catch! The only current limitation is that the Ersilia Model Hub is not yet ready to accept peptides as input (instead of small molecules) - but this is in the pipeline to implement, after text inputs Can I ask you to fill in the model suggestion form so we have that model in our list of to does? Looking forward to your next model suggestions!
Hello @GemmaTuron Thank you so much for giving more insights on the models currently accepted on the Ersilia Model Hub, and I will focus on those for now. Let me add this model on the list of to dos and continue my research.
Model Name rxnmapper: Extraction of organic chemistry grammar from unsupervised learning of chemical reactions
Model Description
Given a SMILES, the model returns the mapped reactions and confidence scores. The rxnmapper model leverages unsupervised learning techniques to extract organic chemistry grammar and enable robust atom mapping on valid reaction SMILES. This model utilizes an ALBERT architecture that was trained in an unsupervised manner on a vast dataset of chemical reactions.
The authors reported remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping.
To implement this model, Python 3.6
is recommended and the RDKit
dependency
Slug atomic-mapping
Tag Chemistry
Publication https://doi.org/10.1126/sciadv.abe4166
Source code https://github.com/rxn4chemistry/rxnmapper The data that was used to train this model was made available at: https://ibm.box.com/v/RXNMapperData
License [MIT license ](https://github.com/rxn4chemistry/rxnmapper/blob/main/LICENSE
Understanding how atoms rearrange during a chemical transformation is fundamental to numerous applications aiming to accelerate organic synthesis and molecular discovery. This model automates atom-mapping, which is a laborious experimental task. Since annotation isn’t necessary for unsupervised learning, which is the approach used by this model, the model has the potential to revolutionize the field of organic chemistry and pave the way for new discoveries.
Hey @HellenNamulinda !
Very good catch, this model seems straightforward to implement and relevant to our work! Can you add it in our list of model suggestions? https://airtable.com/shroQLlkcmDcC0xzm
Hey @HellenNamulinda !
Very good catch, this model seems straightforward to implement and relevant to our work! Can you add it in our list of model suggestions? https://airtable.com/shroQLlkcmDcC0xzm
Sure, Let me add this model to the list, and continue to review the other models I want to suggest.
Model Name HobPre: accurate prediction of human oral bioavailability for small molecules
Model Description
Given SMILES, the HobPre predicts the Human oral bioavailability (HOB) value (two cutoffs of 20% and 50% for classification of molecules).
The prediction is either high or low HOB
Using this model requires Python 3.6
. Also, the other dependencies were provided.
Mordred==1.2.0
scikit-learn==0.23.2
pandas==1.1.1
numpy==1.19.2
matplotlib==3.3.4
To successfully test this model locally on my machine, I had to install rdkit using pip install rdkit-pypi
and also lower the version of networkx from 2.5 to 2.3 using the command pip install networkx==2.3
Slug Oral-bioavailability
Tag HOB
Publication https://doi.org/10.1186/s13321-021-00580-6
Source code https://github.com/whymin/HOB The model Predictions can also be made directly to their online server at http://www.icdrug.com/ICDrug/A . Here, you can type the SMILES, or upload a .txt file.
License Creative Commons
Human oral bioavailability (HOB) is a key factor in determining the fate of new drugs in clinical trials. Bioavailability reflects the fraction of drug absorbed into the systemic circulation when the drug is administered orally. Experimental measurements of drug HOB are not only costly but also a lengthy process. Therefore, a predictive model like HobPre that can evaluate the HOB of a candidate compound before synthesis is of great help to drug discovery.
Hi this model is very interesting! Please add it to the list, for sure! Could you try if you can run it on newer versions of python? Ideally 3.10 but at least 3.8
Hi this model is very interesting! Please add it to the list, for sure! Could you try if you can run it on newer versions of python? Ideally 3.10 but at least 3.8
Greetings @GemmaTuron, I have added it to the list. Let me experiment it with Python 3.10 and share the modifications that we need to run it successfully.
Week 3
Task 3: a third model suggestion
Model Name HobPre: accurate prediction of human oral bioavailability for small molecules
Model Description Given SMILES, the HobPre predicts the Human oral bioavailability (HOB) value (two cutoffs of 20% and 50% for classification of molecules). The prediction is either high or low HOB Using this model requires
Python 3.6
. Also, the other dependencies were provided.Mordred==1.2.0 scikit-learn==0.23.2 pandas==1.1.1 numpy==1.19.2 matplotlib==3.3.4
To successfully test this model locally on my machine, I had to install rdkit using
pip install rdkit-pypi
and also lower the version of networkx from 2.5 to 2.3 using the commandpip install networkx==2.3
Slug Oral-bioavailability
Tag HOB
Publication https://doi.org/10.1186/s13321-021-00580-6
Source code https://github.com/whymin/HOB The model Predictions can also be made directly to their online server at http://www.icdrug.com/ICDrug/A . Here, you can type the SMILES, or upload a .txt file.
License Creative Commons
Human oral bioavailability (HOB) is a key factor in determining the fate of new drugs in clinical trials.. Experimental measurements of drug HOB are not only costly but also a lengthy process. Therefore, a predictive model like HobPre that can evaluate the HOB of a candidate compound before synthesis is of great help to drug discovery.
Setting up environment I use conda to manage the virtual environments.
conda create -n HobPre38 Python=3.8
.conda activate HobPre38
pip install rdkit-pypi Mordred scikit-learn pandas numpy matplotlib
pip install "numpy<1.24" networkx==2.3
pip install scikit-learn==0.23.2
Making Predictions
python HOB_predict.py model smiles.txt 20
Result:
num smiles prediction HOB Class probability(-) probability(+) inside the applicability domain
0 0.0 Nc1nc(Cl)nc2c1ncn2C1OC(CO)C(O)C1F 1.0 High 22.64 77.36 True
1 1.0 NCCc1c[nH]c2ccc(O)cc12 1.0 High 32.01 67.99 True
2 2.0 CC(N)(Cc1ccc(O)c(O)c1)C(=O)O 1.0 High 10.13 89.87 True
python HOB_predict.py model smiles.txt 50
Result:
num smiles prediction HOB Class probability(-) probability(+) inside the applicability domain
0 0.0 Nc1nc(Cl)nc2c1ncn2C1OC(CO)C(O)C1F 1.0 High 36.68 63.32 True
1 1.0 NCCc1c[nH]c2ccc(O)cc12 0.0 Low 62.79 37.21 True
2 2.0 CC(N)(Cc1ccc(O)c(O)c1)C(=O)O 0.0 Low 52.06 47.94 True
Errors and Warnings While running predictions,
module 'numpy' has no attribute 'float'
and module networkx has no attribute biconnected_component_subgraphs
. While I was able to get the predictions, I wanted to fix these errors and a quick workaround was to rollback the versions of these two. i.e from numpy(1.24.2) to numpy(1.23.5) and from network(2.8.8) to network(2.3). The other possible workaround will be to re-write the code.Trying to unpickle estimator DecisionTreeClassifier from version 0.23.2 when using version 1.2.2. Trying to unpickle estimator RandomForestClassifier from version 0.23.2 when using version 1.2.2. Trying to unpickle estimator PCA from version 0.23.2 when using version 1.2.2. This might lead to breaking code or invalid results. Use at your own risk.
The workaround was to install the recommended version of scikit-learn(0.23.2). The other possible workaround will be to re-train the model.Recommended steps to setup the environment
conda create -n HobPre38 Python=3.8
.conda activate HobPre38
pip install rdkit-pypi Mordred pandas matplotlib scikit-learn==0.23.2 "numpy<1.24" networkx==2.3
I tried with Python3.10, but it raises many dependency conflicts. Fixing for one package was breaking another.
Week 3
Task 3: a third model suggestion
Model Name HobPre: accurate prediction of human oral bioavailability for small molecules Model Description Given SMILES, the HobPre predicts the Human oral bioavailability (HOB) value (two cutoffs of 20% and 50% for classification of molecules). The prediction is either high or low HOB Using this model requires
Python 3.6
. Also, the other dependencies were provided.Mordred==1.2.0 scikit-learn==0.23.2 pandas==1.1.1 numpy==1.19.2 matplotlib==3.3.4
To successfully test this model locally on my machine, I had to install rdkit using
pip install rdkit-pypi
and also lower the version of networkx from 2.5 to 2.3 using the commandpip install networkx==2.3
Slug Oral-bioavailability Tag HOB Publication https://doi.org/10.1186/s13321-021-00580-6 Source code https://github.com/whymin/HOB The model Predictions can also be made directly to their online server at http://www.icdrug.com/ICDrug/A . Here, you can type the SMILES, or upload a .txt file. License Creative Commons Human oral bioavailability (HOB) is a key factor in determining the fate of new drugs in clinical trials.. Experimental measurements of drug HOB are not only costly but also a lengthy process. Therefore, a predictive model like HobPre that can evaluate the HOB of a candidate compound before synthesis is of great help to drug discovery.Testing with Python3.8
- Setting up environment I use conda to manage the virtual environments.
- I created one for Python3.8 using the command
conda create -n HobPre38 Python=3.8
.- To activate this environment, use:
conda activate HobPre38
- Installing the packages:
pip install rdkit-pypi Mordred scikit-learn pandas numpy matplotlib
- Fixing prediction errors by rolling back versions of numpy and networkx:
pip install "numpy<1.24" networkx==2.3
- Fixing prediction warning by using recommended version of scikit-learn:
pip install scikit-learn==0.23.2
- Making Predictions
Ensure the model is downloaded by cloning the repo:
- git clone https://github.com/whymin/HOB.git
- cd HOB
- Inferences: "python HOB_predict.py your_model_path your_smiles.txt cutoff"
python HOB_predict.py model smiles.txt 20
Result:num smiles prediction HOB Class probability(-) probability(+) inside the applicability domain 0 0.0 Nc1nc(Cl)nc2c1ncn2C1OC(CO)C(O)C1F 1.0 High 22.64 77.36 True 1 1.0 NCCc1c[nH]c2ccc(O)cc12 1.0 High 32.01 67.99 True 2 2.0 CC(N)(Cc1ccc(O)c(O)c1)C(=O)O 1.0 High 10.13 89.87 True
python HOB_predict.py model smiles.txt 50
- Result:
num smiles prediction HOB Class probability(-) probability(+) inside the applicability domain 0 0.0 Nc1nc(Cl)nc2c1ncn2C1OC(CO)C(O)C1F 1.0 High 36.68 63.32 True 1 1.0 NCCc1c[nH]c2ccc(O)cc12 0.0 Low 62.79 37.21 True 2 2.0 CC(N)(Cc1ccc(O)c(O)c1)C(=O)O 0.0 Low 52.06 47.94 True
Errors and Warnings While running predictions,
- I got module errors:
module 'numpy' has no attribute 'float'
andmodule networkx has no attribute biconnected_component_subgraphs
. While I was able to get the predictions, I wanted to fix these errors and a quick workaround was to rollback the versions of these two. i.e from numpy(1.24.2) to numpy(1.23.5) and from network(2.8.8) to network(2.3). The other possible workaround will be to re-write the code.- I got UserWarning:
Trying to unpickle estimator DecisionTreeClassifier from version 0.23.2 when using version 1.2.2. Trying to unpickle estimator RandomForestClassifier from version 0.23.2 when using version 1.2.2. Trying to unpickle estimator PCA from version 0.23.2 when using version 1.2.2. This might lead to breaking code or invalid results. Use at your own risk.
The workaround was to install the recommended version of scikit-learn(0.23.2). The other possible workaround will be to re-train the model.Recommended steps to setup the environment
- Create env for Python3.8 :
conda create -n HobPre38 Python=3.8
.- To activate the environment:
conda activate HobPre38
- Install the packages:
pip install rdkit-pypi Mordred pandas matplotlib scikit-learn==0.23.2 "numpy<1.24" networkx==2.3
Note: Python3.10
I tried with Python3.10, but it raises many dependency conflicts. Fixing for one package was breaking another.
The HOB class(1.0 for High, and 0.0 for Low) predicted depends on probability(+), and probability(-). If the value of probability(+) is greater than the value of probability(-), the predicted class is High, otherwise, it Low.
Hi @HellenNamulinda !
Good job, do you want to give it a try and try to incorporate it to the Hub? To do so, please open a model request issue and fill in all the fields, also have alook at the exmples on model contribution in our documentation
Hi @HellenNamulinda !
Good job, do you want to give it a try and try to incorporate it to the Hub? To do so, please open a model request issue and fill in all the fields, also have alook at the exmples on model contribution in our documentation
Sure, Thank you! Let me follow the documentation and incorporate it into Ersilia Model Hub.
Hi @HellenNamulinda !
Good job, do you want to give it a try and try to incorporate it to the Hub? To do so, please open a model request issue and fill in all the fields, also have alook at the exmples on model contribution in our documentation
This model's incorporation is using the issue #659
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application