✍️ Contribution period: Hellen Namulinda

ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.

https://ersilia.io

GNU General Public License v3.0

218 stars 147 forks source link

✍️ Contribution period: Hellen Namulinda #634

Closed HellenNamulinda closed 1 year ago

HellenNamulinda commented 1 year ago

Week 1 - Get to know the community

[x] Join the communication channels
[x] Open a GitHub issue (this one!)
[x] Install the Ersilia Model Hub and test the simplest model
[x] Write a motivation statement to work at Ersilia
[x] Submit your first contribution to the Outreachy site

Week 2 - Install and run an ML model

[x] Select a model from the suggested list
[x] Install the model in your system
[x] Run predictions for the EML
[x] Compare results with the Ersilia Model Hub implementation!

Week 3 - Propose new models

[x] Suggest a new model and document it (1)
[x] Suggest a new model and document it (2)
[x] Suggest a new model and document it (3)

Week 4 - Prepare your final application

[x] Submit the final application in the Outreachy website

HellenNamulinda commented 1 year ago

I'm using Ubuntu 22.04.2 and I was able to successfully install and run Ersilia from the CLI.

HellenNamulinda commented 1 year ago

My Motivation

Hello, I am writing to express my interest in contributing to the Extension of the Ersilia Model Hub project during the Outreachy internship program for summer 2023. As an African who grew up in a low-income family in Uganda, I have seen firsthand the devastating impact that diseases can have on communities that lack access to healthcare and medical resources.

Africa has always faced significant challenges when it comes to diseases, with people dying from treatable illnesses. However, I am optimistic about the potential of technology, specifically AI/ML, to help address these challenges and improve health outcomes for people across the continent and across the world. I am excited about the prospect of working with the Ersilia team on this project because I believe that it has the potential to make a significant impact on the lives of people in my community and around the world through the mission to equip laboratories in low and middle-income countries with state-of-the-art AI/ML tools for infectious and neglected disease research.

The Ersilia Model Hub project presents a unique opportunity to make a meaningful impact on global health by accelerating the drug development process and improving the success rate of clinical trials.

With a degree in Computer Science, I have extensive programming knowledge using Python and JavaScript and I have also gained experience research and in developing machine learning models using deep learning frameworks such as TensorFlow and Keras. I believe that I am well-suited to contribute to this project.

During my time at Ersilia, I plan to contribute effectively by reviewing, identifying, testing and incorporating models into the Hub and by developing new AI/ML models for biomedicine. Contributing to this project will not only help me improve my technical skills but also provide me with valuable insights into effective teamwork and project management.

After the internship, I plan to continue using technology and my skills to help address societal challenges. I am confident that the experience gained during the Outreachy program will help me achieve my career goals and make a positive impact on the world.

Thank you!

GemmaTuron commented 1 year ago

Hi @HellenNamulinda Welcome to the contribution period!

HellenNamulinda commented 1 year ago

Week 2:

Task 1: select a model

Hello @GemmaTuron, @miquelduranfrigola, @leoank and @DhanshreeA, The model I selected is STOUT(SMILES-TO-IUPAC-name translator): https://github.com/Kohulan/Smiles-TO-iUpac-Translator

STOUT is a deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string as well as the reverse translation, i.e. IUPAC names back to a valid SMILES string.

Chemical compounds can be identified through a graphical depiction, a suitable string representation, or a chemical name. The International Union of Pure and Applied Chemistry (IUPAC) established universally accepted naming scheme for chemistry based on a set of rules. Correct chemical name assignment is still difficult for humans to perform because of the complexity of this ruleset, and there are only a few rule-based cheminformatics toolkits that can help with this process automatically.

STOUT was developed based on language translation and language understanding. The two chemical representations were treated as two different languages – each SMILES string and corresponding IUPAC name was treated as two different sentences that have the same meaning in reality.

HellenNamulinda commented 1 year ago

Week 2:

Task 2: install

Hello @GemmaTuron, @miquelduranfrigola, @leoank and @DhanshreeA, I use a Linux machine with Ubuntu 22.04. To install STOUT, I created a conda environment with Python version 3.8. I then installed STOUT-pypi together with the dependencies in the environment I had created. The packages of my STOUT environment are listed below;

Package                      Version
---------------------------- ---------
absl-py                      1.4.0
astunparse                   1.6.3
cachetools                   5.3.0
certifi                      2022.12.7
charset-normalizer           3.1.0
click                        8.1.3
flatbuffers                  23.3.3
gast                         0.4.0
google-auth                  2.16.2
google-auth-oauthlib         0.4.6
google-pasta                 0.2.0
grpcio                       1.51.3
h5py                         3.8.0
idna                         3.4
importlib-metadata           6.0.0
JPype1                       1.4.1
keras                        2.10.0
Keras-Preprocessing          1.1.2
libclang                     15.0.6.1
Markdown                     3.4.1
MarkupSafe                   2.1.2
numpy                        1.24.2
oauthlib                     3.2.2
opt-einsum                   3.3.0
packaging                    23.0
pip                          23.0.1
protobuf                     3.19.6
pyasn1                       0.4.8
pyasn1-modules               0.2.8
pystow                       0.5.0
requests                     2.28.2
requests-oauthlib            1.3.1
rsa                          4.9
setuptools                   65.6.3
six                          1.16.0
STOUT-pypi                   2.0.5
tensorboard                  2.10.1
tensorboard-data-server      0.6.1
tensorboard-plugin-wit       1.8.1
tensorflow                   2.10.1
tensorflow-estimator         2.10.0
tensorflow-io-gcs-filesystem 0.31.0
termcolor                    2.2.0
tqdm                         4.65.0
typing_extensions            4.5.0
unicodedata2                 15.0.0
urllib3                      1.26.15
Werkzeug                     2.2.3
wheel                        0.38.4
wrapt                        1.15.0
zipp                         3.15.0

My installation was successful. I verified it using a simple STOUT example translate-forward IUPAC name of CN1C=NC2=C1C(=O)N(C(=O)N2C)C is: 1,3,7-trimethylpurine-2,6-dione And translate_reverse SMILES of 1,3,7-trimethylpurine-2,6-dione is: CN1C=NC2=C1C(=O)N(C)C(=O)N2C

N.B While running the model, i got error related to JVM DLL not found. However, installing java and the jdk using the command sudo apt-get install default-jre default-jdk fixed the problem for me.

GemmaTuron commented 1 year ago

Hello @HellenNamulinda ! Wonderful, can you try it our from the Ersilia Model Hub and check it also works?

Thanks

HellenNamulinda commented 1 year ago

Hello @HellenNamulinda ! Wonderful, can you try it our from the Ersilia Model Hub and check it also works?

Thanks

Respected @GemmaTuron, Apologies for the delay, I have been having power issues. But I'm on it today and I will be sharing the next updates soon.

HellenNamulinda commented 1 year ago

Week 2:

Task 3: run predictions

I sampled a few examples(10) from the Essential Medicines List and ran predictions on them using the installed model. I used a notebook and pandas made it easy for me to slice the data, getting lists for both IUPAC names(drugs) and SMILES. I got translations for both SMILES to IUPAC names using translate-forward and back/ reverse translations for IUPAC names to SMILES using translate_reverse

Examples from; SMILES to IUPAC names

SMILES: CC(O)=O 
 Predicted IUPAC name: aceticacid 
 Correct IUPAC name: acetic acid 

SMILES: CC(=O)N[C@@H](CS)C(O)=O 
 Predicted IUPAC name: (2R)-2-acetamido-3-sulfanylpropanoicacid 
 Correct IUPAC name: acetylcysteine 

SMILES: CC(=O)Oc1ccccc1C(O)=O 
 Predicted IUPAC name: 2-acetyloxybenzoicacid 
 Correct IUPAC name: acetylsalicylic acid 

SMILES: NC1=NC(=O)c2ncn(COCCO)c2N1 
 Predicted IUPAC name: 2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one 
 Correct IUPAC name: aciclovir 

SMILES: OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1 
 Predicted IUPAC name: [(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]2-hydroxy-2,2-dithiophen-2-ylacetate 
 Correct IUPAC name: aclidinium 

SMILES: CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1 
 Predicted IUPAC name: (E)-N-[4-(3-chloro-4-fluoroanilino)-7-[(3S)-oxolan-3-yl]oxyquinazolin-6-yl]-4-(dimethylamino)but-2-enamide 
 Correct IUPAC name: afatinib 

SMILES: CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1 
 Predicted IUPAC name: methylN-(6-propylsulfanyl-1H-benzimidazol-2-yl)carbamate 
 Correct IUPAC name: albendazole 

SMILES: O=C1N=CN=C2NNC=C12 
 Predicted IUPAC name: 1,2-dihydropyrazolo[3,4-d]pyrimidin-4-one 
 Correct IUPAC name: allopurinol 

SMILES: CC(=O)Nc1c(I)c(NC(C)=O)c(I)c(C(O)=O)c1I 
 Predicted IUPAC name: 3,5-diacetamido-2,4,6-triiodobenzoicacid 
 Correct IUPAC name: amidotrizoate

Examples from; IUPAC names to SMILES

IUPAC name: acetic acid 
 Predicted SMILES: CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC(=O)O.CC 
 Correct SMILES: CC(O)=O 

IUPAC name: acetylcysteine 
 Predicted SMILES: CC(=O)[Sb]=CC=C1 
 Correct SMILES: CC(=O)N[C@@H](CS)C(O)=O 

IUPAC name: acetylsalicylic acid 
 Predicted SMILES: CC(=O)[Se](=O)(=O)O 
 Correct SMILES: CC(=O)Oc1ccccc1C(O)=O 

IUPAC name: aciclovir 
 Predicted SMILES: [Cu] 
 Correct SMILES: NC1=NC(=O)c2ncn(COCCO)c2N1 

IUPAC name: aclidinium 
 Predicted SMILES: [Ac] 
 Correct SMILES: OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1 

IUPAC name: afatinib 
 Predicted SMILES: [At] 
 Correct SMILES: CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1 

IUPAC name: albendazole 
 Predicted SMILES: C1=CC2=CN=NC2=CC=C1 
 Correct SMILES: CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1 

IUPAC name: allopurinol 
 Predicted SMILES: C1=C2C(=NC=N2)N=CN1O 
 Correct SMILES: O=C1N=CN=C2NNC=C12 

IUPAC name: amidotrizoate 
 Predicted SMILES: [N-]=[N+]=NON 
 Correct SMILES: CC(=O)Nc1c(I)c(NC(C)=O)c(I)c(C(O)=O)c1I

From the translations, the model still struggles to accurately translate SMILES to IUPAC names on the EML dataset.

HellenNamulinda commented 1 year ago

Week 2:

Task 4: Compare results with the Ersilia Model Hub implementation

For the same examples sampled from the Essential Medicines List , I ran predictions on the version of the same model available on Ersilia Model Hub using Ersilia CLI. I fetched the model in my ersilia conda envt using the command ersilia fetch eos4se9. I served the model using the command ersilia serve smiles2iupac And for each of the examples, I ran ersilia api -i <SMILES>

Examples of translations; SMILES to IUPAC names

{
    "input": {
        "key": "QTBSBXVTEAMEQO-UHFFFAOYSA-N",
        "input": "CC(O)=O",
        "text": "CC(O)=O"
    },
    "output": {
        "outcome": [
            "aceticacid"
        ]
    }
}

{
    "input": {
        "key": "PWKSKIMOESPYIA-BYPYZUCNSA-N",
        "input": "CC(=O)N[C@@H](CS)C(O)=O",
        "text": "CC(=O)N[C@@H](CS)C(O)=O"
    },
    "output": {
        "outcome": [
            "(2R)-2-acetamido-3-sulfanylpropanoicacid" 
        ]
    }
}

{
    "input": {
        "key": "BSYNRYMUTXBXSQ-UHFFFAOYSA-N",
        "input": "CC(=O)Oc1ccccc1C(O)=O",
        "text": "CC(=O)Oc1ccccc1C(O)=O"
    },
    "output": {
        "outcome": [
            "2-acetyloxybenzoicacid" 
        ]
    }
}

{
    "input": {
        "key": "MKUXAQIIEYXACX-UHFFFAOYSA-N",
        "input": "NC1=NC(=O)c2ncn(COCCO)c2N1",
        "text": "NC1=NC(=O)c2ncn(COCCO)c2N1"
    },
    "output": {
        "outcome": [
            "2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one" 
        ]
    }
}

Based on these examples, the model on Ersilia Model Hub happens perform like the original model.

GemmaTuron commented 1 year ago

Hi @HellenNamulinda !

Thanks for this good work! Can you have a look at Adedeji's issue, who also run the same model, and let us know which version of the STOUT are you using? And then move forward with week 3!

Thanks

HellenNamulinda commented 1 year ago

Hi @HellenNamulinda !

Thanks for this good work! Can you have a look at Adedeji's issue, who also run the same model, and let us know which version of the STOUT are you using? And then move forward with week 3!

Thanks

Greetings @GemmaTuron , I appreciate the compliments. I am using STOUT version 2.0.5

>>> STOUT.__version__
'2.0.5'

Let me look at Adedeji's issue and see how I can contribute.

Med16-11 commented 1 year ago

hey @HellenNamulinda ! I have successfully installed the model, Can you tell me how you ran those examples?

HellenNamulinda commented 1 year ago

hey @HellenNamulinda ! I have successfully installed the model, Can you tell me how you ran those examples?

Hello @Med16-11, First download the Essential Medicines List . Just right-click and choose save as. You will be able to save .csv file.

Since the examples are so many, you might just sample a few of the them. You can use a Jupyter notebook. This code can help;

#importing libraries
import pandas as pd
from STOUT import translate_forward, translate_reverse

# Dataset
eml_data = pd.read_csv('./eml_canonical.csv')
eml_data.head()

#Part of the data
test_data_smiles = eml_data['smiles'].values.tolist()[3:13] 
print('SMILES:  {}'.format(test_data_smiles))
test_data_drugs = eml_data['drugs'].values.tolist()[3:13] 
print('IUPAC names: {}'.format(test_data_drugs))

#Forward translation
# SMILES to IUPAC name translation
for x in range(len(test_data_smiles)):
     SMILES = test_data_smiles[x]
     IUPAC_name = translate_forward(SMILES)
     #print("IUPAC name of "+SMILES+" is: "+IUPAC_name)
     print("SMILES: {} \n".format(SMILES), "Predicted IUPAC name: {} \n".format(IUPAC_name), "Correct IUPAC name: {} \n".format(test_data_drugs[x]))

#Back translation
# IUPAC name to SMILES translation
for x in range(len(test_data_drugs)):
     IUPAC_name= test_data_drugs[x]
     SMILES = translate_reverse(IUPAC_name)
     print("IUPAC name: {} \n".format(IUPAC_name), "Predicted SMILES: {} \n".format(SMILES), "Correct SMILES: {} \n".format(test_data_smiles[x]))

Let me know if this helps

Med16-11 commented 1 year ago

Thanks for the quick response @HellenNamulinda I am actually working on ubuntu an tried running this file by saving it as .py But its giving lots of error. Here are the ss, I have tried running it both ways

when I was in base directory
conda STOUT env

HellenNamulinda commented 1 year ago

Thanks for the quick response @HellenNamulinda I am actually working on ubuntu an tried running this file by saving it as .py But its giving lots of error. Here are the ss, I have tried running it both ways

when I was in base directory

conda STOUT env

Hi @Med16-11, this error shows that your environment doesn't have STOUT installed. Run this command pip install STOUT-pypi to install it in the virtual environment

GemmaTuron commented 1 year ago

Hi @Med16-11

Indeed it does seem you have not installed stout in the right conda environment, or when you did there was an error. Can you try again and let us know? Let's use your specific issue for getting this right.

@HellenNamulinda I think you are ready to go onto week 3 tasks! Thanks

Med16-11 commented 1 year ago

sure

HellenNamulinda commented 1 year ago

Week 3

Task 1: a first model suggestion

Model Name HydrAMP: a deep generative model for antimicrobial peptide discovery

Model Description HydrAMP is a model for generation of novel peptide sequences satisfying given antimicrobial activity conditions. It was trained as a conditional variational autoencoder that captures the antimicrobial properties of peptides by learning their lower-dimensional, continuous space. The model disentangles the peptide's representation from its antimicrobial conditions. HydrAMP can generate diverse and potent peptides, which is a step towards resolving the antimicrobial resistance crisis. Slug antimicrobial-peptides Tag antimicrobial-resistance, drug-discovery, antimicrobial-peptides Publication https://doi.org/10.1101/2022.01.27.478054 Source code https://github.com/szczurek-lab/hydramp checkpoints of HydrAMP are provided as well the notebooks for retraining. License MIT license

This model is important as it shows progress towards a new generation of antibiotics.

GemmaTuron commented 1 year ago

Hi @HellenNamulinda !

I really like this model, good catch! The only current limitation is that the Ersilia Model Hub is not yet ready to accept peptides as input (instead of small molecules) - but this is in the pipeline to implement, after text inputs Can I ask you to fill in the model suggestion form so we have that model in our list of to does? Looking forward to your next model suggestions!

HellenNamulinda commented 1 year ago

Hi @HellenNamulinda !

I really like this model, good catch! The only current limitation is that the Ersilia Model Hub is not yet ready to accept peptides as input (instead of small molecules) - but this is in the pipeline to implement, after text inputs Can I ask you to fill in the model suggestion form so we have that model in our list of to does? Looking forward to your next model suggestions!

Hello @GemmaTuron Thank you so much for giving more insights on the models currently accepted on the Ersilia Model Hub, and I will focus on those for now. Let me add this model on the list of to dos and continue my research.

HellenNamulinda commented 1 year ago

Week 3

Task 2: a second model suggestion

Model Name rxnmapper: Extraction of organic chemistry grammar from unsupervised learning of chemical reactions

Model Description Given a SMILES, the model returns the mapped reactions and confidence scores. The rxnmapper model leverages unsupervised learning techniques to extract organic chemistry grammar and enable robust atom mapping on valid reaction SMILES. This model utilizes an ALBERT architecture that was trained in an unsupervised manner on a vast dataset of chemical reactions. The authors reported remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. To implement this model, Python 3.6 is recommended and the RDKitdependency

Slug atomic-mapping

Tag Chemistry

Publication https://doi.org/10.1126/sciadv.abe4166

Source code https://github.com/rxn4chemistry/rxnmapper The data that was used to train this model was made available at: https://ibm.box.com/v/RXNMapperData

License [MIT license ](https://github.com/rxn4chemistry/rxnmapper/blob/main/LICENSE

Understanding how atoms rearrange during a chemical transformation is fundamental to numerous applications aiming to accelerate organic synthesis and molecular discovery. This model automates atom-mapping, which is a laborious experimental task. Since annotation isn’t necessary for unsupervised learning, which is the approach used by this model, the model has the potential to revolutionize the field of organic chemistry and pave the way for new discoveries.

GemmaTuron commented 1 year ago

Hey @HellenNamulinda !

Very good catch, this model seems straightforward to implement and relevant to our work! Can you add it in our list of model suggestions? https://airtable.com/shroQLlkcmDcC0xzm

HellenNamulinda commented 1 year ago

Hey @HellenNamulinda !

Very good catch, this model seems straightforward to implement and relevant to our work! Can you add it in our list of model suggestions? https://airtable.com/shroQLlkcmDcC0xzm

Sure, Let me add this model to the list, and continue to review the other models I want to suggest.

HellenNamulinda commented 1 year ago

Week 3

Task 3: a third model suggestion

Model Name HobPre: accurate prediction of human oral bioavailability for small molecules

Model Description Given SMILES, the HobPre predicts the Human oral bioavailability (HOB) value (two cutoffs of 20% and 50% for classification of molecules). The prediction is either high or low HOB Using this model requires Python 3.6. Also, the other dependencies were provided.

Mordred==1.2.0
scikit-learn==0.23.2
pandas==1.1.1
numpy==1.19.2
matplotlib==3.3.4

To successfully test this model locally on my machine, I had to install rdkit using pip install rdkit-pypi and also lower the version of networkx from 2.5 to 2.3 using the command pip install networkx==2.3

Slug Oral-bioavailability

Tag HOB

Publication https://doi.org/10.1186/s13321-021-00580-6

Source code https://github.com/whymin/HOB The model Predictions can also be made directly to their online server at http://www.icdrug.com/ICDrug/A . Here, you can type the SMILES, or upload a .txt file.

License Creative Commons

Human oral bioavailability (HOB) is a key factor in determining the fate of new drugs in clinical trials. Bioavailability reflects the fraction of drug absorbed into the systemic circulation when the drug is administered orally. Experimental measurements of drug HOB are not only costly but also a lengthy process. Therefore, a predictive model like HobPre that can evaluate the HOB of a candidate compound before synthesis is of great help to drug discovery.

GemmaTuron commented 1 year ago

Hi this model is very interesting! Please add it to the list, for sure! Could you try if you can run it on newer versions of python? Ideally 3.10 but at least 3.8

HellenNamulinda commented 1 year ago

Hi this model is very interesting! Please add it to the list, for sure! Could you try if you can run it on newer versions of python? Ideally 3.10 but at least 3.8

Greetings @GemmaTuron, I have added it to the list. Let me experiment it with Python 3.10 and share the modifications that we need to run it successfully.

HellenNamulinda commented 1 year ago

Week 3

Task 3: a third model suggestion

Model Name HobPre: accurate prediction of human oral bioavailability for small molecules

Model Description Given SMILES, the HobPre predicts the Human oral bioavailability (HOB) value (two cutoffs of 20% and 50% for classification of molecules). The prediction is either high or low HOB Using this model requires Python 3.6. Also, the other dependencies were provided.
Mordred==1.2.0
scikit-learn==0.23.2
pandas==1.1.1
numpy==1.19.2
matplotlib==3.3.4
To successfully test this model locally on my machine, I had to install rdkit using pip install rdkit-pypi and also lower the version of networkx from 2.5 to 2.3 using the command pip install networkx==2.3

Slug Oral-bioavailability

Tag HOB

Publication https://doi.org/10.1186/s13321-021-00580-6

Source code https://github.com/whymin/HOB The model Predictions can also be made directly to their online server at http://www.icdrug.com/ICDrug/A . Here, you can type the SMILES, or upload a .txt file.

License Creative Commons

Human oral bioavailability (HOB) is a key factor in determining the fate of new drugs in clinical trials.. Experimental measurements of drug HOB are not only costly but also a lengthy process. Therefore, a predictive model like HobPre that can evaluate the HOB of a candidate compound before synthesis is of great help to drug discovery.

Testing with Python3.8

Setting up environment I use conda to manage the virtual environments.
1. I created one for Python3.8 using the command conda create -n HobPre38 Python=3.8.
2. To activate this environment, use: conda activate HobPre38
3. Installing the packages: pip install rdkit-pypi Mordred scikit-learn pandas numpy matplotlib
4. Fixing prediction errors by rolling back versions of numpy and networkx: pip install "numpy<1.24" networkx==2.3
5. Fixing prediction warning by using recommended version of scikit-learn: pip install scikit-learn==0.23.2
Making Predictions
1. Ensure the model is downloaded by cloning the repo:
  - git clone https://github.com/whymin/HOB.git
  - cd HOB
2. Inferences: "python HOB_predict.py your_model_path your_smiles.txt cutoff"

python HOB_predict.py model smiles.txt 20 Result:

num                             smiles  prediction HOB Class  probability(-)  probability(+)  inside the applicability domain
0  0.0  Nc1nc(Cl)nc2c1ncn2C1OC(CO)C(O)C1F         1.0      High           22.64           77.36                             True
1  1.0             NCCc1c[nH]c2ccc(O)cc12         1.0      High           32.01           67.99                             True
2  2.0       CC(N)(Cc1ccc(O)c(O)c1)C(=O)O         1.0      High           10.13           89.87                             True

python HOB_predict.py model smiles.txt 50

Result:

num                             smiles  prediction HOB Class  probability(-)  probability(+)  inside the applicability domain
0  0.0  Nc1nc(Cl)nc2c1ncn2C1OC(CO)C(O)C1F         1.0      High           36.68           63.32                             True
1  1.0             NCCc1c[nH]c2ccc(O)cc12         0.0       Low           62.79           37.21                             True
2  2.0       CC(N)(Cc1ccc(O)c(O)c1)C(=O)O         0.0       Low           52.06           47.94                             True

Errors and Warnings While running predictions,

I got module errors: module 'numpy' has no attribute 'float' and module networkx has no attribute biconnected_component_subgraphs. While I was able to get the predictions, I wanted to fix these errors and a quick workaround was to rollback the versions of these two. i.e from numpy(1.24.2) to numpy(1.23.5) and from network(2.8.8) to network(2.3). The other possible workaround will be to re-write the code.
I got UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.23.2 when using version 1.2.2. Trying to unpickle estimator RandomForestClassifier from version 0.23.2 when using version 1.2.2. Trying to unpickle estimator PCA from version 0.23.2 when using version 1.2.2. This might lead to breaking code or invalid results. Use at your own risk. The workaround was to install the recommended version of scikit-learn(0.23.2). The other possible workaround will be to re-train the model.

Recommended steps to setup the environment

Create env for Python3.8 : conda create -n HobPre38 Python=3.8.
To activate the environment: conda activate HobPre38
Install the packages: pip install rdkit-pypi Mordred pandas matplotlib scikit-learn==0.23.2 "numpy<1.24" networkx==2.3

Note: Python3.10

I tried with Python3.10, but it raises many dependency conflicts. Fixing for one package was breaking another.

HellenNamulinda commented 1 year ago

Week 3

Task 3: a third model suggestion

Model Name HobPre: accurate prediction of human oral bioavailability for small molecules Model Description Given SMILES, the HobPre predicts the Human oral bioavailability (HOB) value (two cutoffs of 20% and 50% for classification of molecules). The prediction is either high or low HOB Using this model requires Python 3.6. Also, the other dependencies were provided.
Mordred==1.2.0
scikit-learn==0.23.2
pandas==1.1.1
numpy==1.19.2
matplotlib==3.3.4
To successfully test this model locally on my machine, I had to install rdkit using pip install rdkit-pypi and also lower the version of networkx from 2.5 to 2.3 using the command pip install networkx==2.3 Slug Oral-bioavailability Tag HOB Publication https://doi.org/10.1186/s13321-021-00580-6 Source code https://github.com/whymin/HOB The model Predictions can also be made directly to their online server at http://www.icdrug.com/ICDrug/A . Here, you can type the SMILES, or upload a .txt file. License Creative Commons Human oral bioavailability (HOB) is a key factor in determining the fate of new drugs in clinical trials.. Experimental measurements of drug HOB are not only costly but also a lengthy process. Therefore, a predictive model like HobPre that can evaluate the HOB of a candidate compound before synthesis is of great help to drug discovery.
Testing with Python3.8

Setting up environment I use conda to manage the virtual environments.

I created one for Python3.8 using the command conda create -n HobPre38 Python=3.8.

To activate this environment, use: conda activate HobPre38

Installing the packages: pip install rdkit-pypi Mordred scikit-learn pandas numpy matplotlib

Fixing prediction errors by rolling back versions of numpy and networkx: pip install "numpy<1.24" networkx==2.3

Fixing prediction warning by using recommended version of scikit-learn: pip install scikit-learn==0.23.2

Making Predictions

Ensure the model is downloaded by cloning the repo:

git clone https://github.com/whymin/HOB.git

cd HOB

Inferences: "python HOB_predict.py your_model_path your_smiles.txt cutoff"

python HOB_predict.py model smiles.txt 20 Result:
   num                             smiles  prediction HOB Class  probability(-)  probability(+)  inside the applicability domain
0  0.0  Nc1nc(Cl)nc2c1ncn2C1OC(CO)C(O)C1F         1.0      High           22.64           77.36                             True
1  1.0             NCCc1c[nH]c2ccc(O)cc12         1.0      High           32.01           67.99                             True
2  2.0       CC(N)(Cc1ccc(O)c(O)c1)C(=O)O         1.0      High           10.13           89.87                             True
python HOB_predict.py model smiles.txt 50

Result:
   num                             smiles  prediction HOB Class  probability(-)  probability(+)  inside the applicability domain
0  0.0  Nc1nc(Cl)nc2c1ncn2C1OC(CO)C(O)C1F         1.0      High           36.68           63.32                             True
1  1.0             NCCc1c[nH]c2ccc(O)cc12         0.0       Low           62.79           37.21                             True
2  2.0       CC(N)(Cc1ccc(O)c(O)c1)C(=O)O         0.0       Low           52.06           47.94                             True
Errors and Warnings While running predictions,

I got module errors: module 'numpy' has no attribute 'float' and module networkx has no attribute biconnected_component_subgraphs. While I was able to get the predictions, I wanted to fix these errors and a quick workaround was to rollback the versions of these two. i.e from numpy(1.24.2) to numpy(1.23.5) and from network(2.8.8) to network(2.3). The other possible workaround will be to re-write the code.

I got UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.23.2 when using version 1.2.2. Trying to unpickle estimator RandomForestClassifier from version 0.23.2 when using version 1.2.2. Trying to unpickle estimator PCA from version 0.23.2 when using version 1.2.2. This might lead to breaking code or invalid results. Use at your own risk. The workaround was to install the recommended version of scikit-learn(0.23.2). The other possible workaround will be to re-train the model.

Recommended steps to setup the environment

Create env for Python3.8 : conda create -n HobPre38 Python=3.8.

To activate the environment: conda activate HobPre38

Install the packages: pip install rdkit-pypi Mordred pandas matplotlib scikit-learn==0.23.2 "numpy<1.24" networkx==2.3

Note: Python3.10

I tried with Python3.10, but it raises many dependency conflicts. Fixing for one package was breaking another.

Interpretation of results

The HOB class(1.0 for High, and 0.0 for Low) predicted depends on probability(+), and probability(-). If the value of probability(+) is greater than the value of probability(-), the predicted class is High, otherwise, it Low.

GemmaTuron commented 1 year ago

Hi @HellenNamulinda !

Good job, do you want to give it a try and try to incorporate it to the Hub? To do so, please open a model request issue and fill in all the fields, also have alook at the exmples on model contribution in our documentation

HellenNamulinda commented 1 year ago

Hi @HellenNamulinda !

Good job, do you want to give it a try and try to incorporate it to the Hub? To do so, please open a model request issue and fill in all the fields, also have alook at the exmples on model contribution in our documentation

Sure, Thank you! Let me follow the documentation and incorporate it into Ersilia Model Hub.

HellenNamulinda commented 1 year ago

Hi @HellenNamulinda !

Good job, do you want to give it a try and try to incorporate it to the Hub? To do so, please open a model request issue and fill in all the fields, also have alook at the exmples on model contribution in our documentation

This model's incorporation is using the issue #659