Closed masroor07 closed 1 year ago
Hi @pauline-banye, How long does it take to create the environment for the ADME-NCATS model? Seems like it takes a lot of time. Sorry for the inconvenience! Thank you
Hi @masroor07 Which is the command you are using that gets stuck? The models must be downloaded manually btw
It finally completed downloading all the the dependencies but
app.py
didn't seem to work! I received following errorImportError: cannot import name 'escape' from 'jinja2'
. I simply uninstalled Flask and reinstalled Flask to solve the problem. I am now downloading the models for testing.Thank you
I was able to run the model PAMPA Permeability (pH 5.0)
. I was able to follow the simple instructions to install the model on my sub system(UBUNTU).
MODEL INTERPRETATION: The model provides predicted class (1 or 0) for a given compound. If the predicted class is '1', it means the compound is predicted to have 'low permeability' (i.e., log Peff < 1.0) and if the predicted class is '0', the compound is predicted to have 'moderate to high permeability' (i.e., log Peff > 1.0). The models also provide a probability score (between 0 and 1), shown in parentheses next to the predicted class.
Input to the model: I used 10 SMILES to test the model. 10_SMILES.csv Output: ADME_Predictions_2023-03-15-070819.csv
I will now try to compare the results of the ADME-NCATS model with model implemented in Ersilia (eos81ew).
Perfect many thanks @masroor07 ! We need to check we are getting the same values from the original model and the one we implemented at Ersilia. Please note we did some transformation in the Ersilia Model Hub to the results to give always the probability of 1, let's see if all is correctly working :)
Alright! thank you for the update. I will compare the results in a while.
Great job on debugging @masroor07 😊. Glad you were able to get the NCATS ADME model working 👍
Great job on debugging @masroor07 😊. Glad you were able to get the NCATS ADME model working 👍 Thanks @pauline-banye !
I have been trying to fetch PAMPA5 using ersilia. But i get the following error:
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '../../checkpoints'
I tried giving elevated privileges to the user but that doesn't seem to solve the problem.
Great job on debugging @masroor07 😊. Glad you were able to get the NCATS ADME model working 👍 Thanks @pauline-banye !
I have been trying to fetch PAMPA5 using ersilia. But i get the following error:
mkdir(name, mode) PermissionError: [Errno 13] Permission denied: '../../checkpoints'
I tried giving elevated privileges to the user but that doesn't seem to solve the problem.
@masroor07, i'm getting the same error while fetching ersilia NCAT solubility model. If you find how to resolve it, let me know too. Thanks
Great job on debugging @masroor07 😊. Glad you were able to get the NCATS ADME model working 👍 Thanks @pauline-banye !
I have been trying to fetch PAMPA5 using ersilia. But i get the following error:
mkdir(name, mode) PermissionError: [Errno 13] Permission denied: '../../checkpoints'
I tried giving elevated privileges to the user but that doesn't seem to solve the problem.
@ZakiaYahya @masroor07 Someone had this same issue and it was resolved by granting the user privileges.
The Permission error was due to the Model trying to create a directory, But the User had insufficient permission, So I had to grant the user Privileges.
https://github.com/ersilia-os/ersilia/issues/615#issuecomment-1470361591
Still facing the same issue. Tried giving the user privileges but that doesn't seem to solve my problem.
Hi @masroor07 and @ZakiaYahya
Are your users Admins of the computers you are using? You need to grant yourselves super-user privileges (this differs in windows, linux and macos). In Linux you can check this for example. As a workaround @masroor07 can you use the Colab implementation to run the model through Ersilia? @pauline-banye can you share more information on where the mkdir command is used? perhaps it wouldn't be necessary
Hi @masroor07 and @ZakiaYahya
Are your users Admins of the computers you are using? You need to grant yourselves super-user privileges (this differs in windows, linux and macos). In Linux you can check this for example. As a workaround @masroor07 can you use the Colab implementation to run the model through Ersilia? @pauline-banye can you share more information on where the mkdir command is used? perhaps it wouldn't be necessary
Hi @GemmaTuron, the creation of the directory is not necessary anymore.
It was necessary when the models were downloaded directly. It checks if the directory exists and creates it if it doesn't.
The current setup with the models already downloaded and within the repository makes it unnecessary.
@pauline-banye then this should be removed from the code, legacy code specially if it requires special user permissions can give problems afterwards as we are seeing
Hi @masroor07 and @ZakiaYahya
Are your users Admins of the computers you are using? You need to grant yourselves super-user privileges (this differs in windows, linux and macos). In Linux you can check this for example. As a workaround @masroor07 can you use the Colab implementation to run the model through Ersilia? @pauline-banye can you share more information on where the mkdir command is used? perhaps it wouldn't be necessary
Hello @GemmaTuron and @pauline-banye, i have successfully fetch the model by granting user privileges. Thanks.
Hi @masroor07 and @ZakiaYahya
Are your users Admins of the computers you are using? You need to grant yourselves super-user privileges (this differs in windows, linux and macos). In Linux you can check this for example. As a workaround @masroor07 can you use the Colab implementation to run the model through Ersilia? @pauline-banye can you share more information on where the mkdir command is used? perhaps it wouldn't be necessary
Hello @GemmaTuron and @pauline-banye, i have successfully fetch the model by granting user privileges. Thanks.
That's fantastic @ZakiaYahya 👍
Hi @masroor07 and @ZakiaYahya
Are your users Admins of the computers you are using? You need to grant yourselves super-user privileges (this differs in windows, linux and macos). In Linux you can check this for example. As a workaround @masroor07 can you use the Colab implementation to run the model through Ersilia? @pauline-banye can you share more information on where the mkdir command is used? perhaps it wouldn't be necessary
Will try running the model using colab alright! I probably messed up with granting privileges, will try to work my way around it.
Thank you
Was finally able to fetch the model.
Fetching eos81ew done in time: 0:03:38.524597s
18:56:29 | INFO | Fetching eos81ew done successfully: 0:03:38.524597
Tried running it for a sample smile:
ersilia api run -i "CCCC"
{
"input": {
"key": "IJDNQMDRQITEOD-UHFFFAOYSA-N",
"input": "CCCC",
"text": "CCCC"
},
"output": {
"outcome": [
1.970733478628972e-07
]
}
}
Hello @masroor07 ! Great explanations, thanks for your dedication. As an extra task for this week, might I ask you to try to run the ADME-NCATS models? We have had some issues with @pauline-banye with some of them and I'd like to know if these are persistent across users. Could you:
- Follow the installation detailed in the adme-ncats repo (please, use the development branch, not main)
- Downlowad the pretrained model PAMPA5
- Test 5 molecules on the ADME-NCATS model as well as on the model implemented in Ersilia (eos81ew) and compare the results
Comparison - ADME-NCATS(PAMPA5.0) AND NCATS-PAMPA5(eos81ew) Summary of the overall process. adme-ncats(PAMPA5.0): The model provides predicted class (1 or 0) for a given compound. If the predicted class is '1', it means the compound is predicted to have 'low permeability' (i.e., log Peff < 1.0) and if the predicted class is '0', the compound is predicted to have 'moderate to high permeability' (i.e., log Peff > 1.0). The models also provide a probability score (between 0 and 1), shown in parentheses next to the predicted class.
Input to the model: 10_SMILES.csv Output: ADME_Predictions_2023-03-15-070819.csv ncats-pampa5(eos81ew): Vitro surrogate to determine the permeability of drugs across cellular membranes. The Peff was converted to logarithmic, log Peff value lower than 2.0 were considered to have low to moderate permeability, and those with a value higher than 2.5 were considered as high-permeability compounds. Compounds with a value between 2.0 and 2.5 were omitted from the dataset.
Challenges during installation: The model requires the user to have elevated privileges which should not be the case. I was able to solve the problem by elevating the user's privileges and during the process, ended up corrupting a couple of my system files. I was able to resolve the issue by going through a discussion on stackoverflow. The error that was thrown: mkdir(name, mode) PermissionError: [Errno 13] Permission denied: '../../checkpoints'
Input to the model: 10_SMILES.csv
Output: processed.csv
Hi @masroor07
Great that you were able to run both models! The ADME NCATS and the implementation at Ersilia are the same model, so in the eos81ew you are simply describing the dataset used to train the model, but not the actual model output? Can you now compare the predictions you got for the same molecules on both models and see if they make sense? Since they are the same model they should be coinciding
@pauline-banye then this should be removed from the code, legacy code specially if it requires special user permissions can give problems afterwards as we are seeing
Hi @GemmaTuron I can make the edits to the code right now. I'm creating a temporary fork and making a PR. I'm doing this for all the NCATS models.
Hi @masroor07
Great that you were able to run both models! The ADME NCATS and the implementation at Ersilia are the same model, so in the eos81ew you are simply describing the dataset used to train the model, but not the actual model output? Can you now compare the predictions you got for the same molecules on both models and see if they make sense? Since they are the same model they should be coinciding
Alright, will try to run predictions for same molecules on both the model.
Hi @masroor07
Great that you were able to run both models! The ADME NCATS and the implementation at Ersilia are the same model, so in the eos81ew you are simply describing the dataset used to train the model, but not the actual model output? Can you now compare the predictions you got for the same molecules on both models and see if they make sense? Since they are the same model they should be coinciding
Hi @GemmaTuron,
Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1
, the output I get is 0.903985857963562, which indicates that the molecule is not highly permeable. CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
, the output is 0.0090377377346158, which indicates that the molecule is moderate to highly permable.Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1
, the output I get is low permeability with Predicted Class (Probability) equal to 1 and for CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
, the output is high permeability with Predicted Class (Probability) equal to 0Hi @masroor07 Great that you were able to run both models! The ADME NCATS and the implementation at Ersilia are the same model, so in the eos81ew you are simply describing the dataset used to train the model, but not the actual model output? Can you now compare the predictions you got for the same molecules on both models and see if they make sense? Since they are the same model they should be coinciding
Hi @GemmaTuron,
- I tried running the predictions for various molecules on both the models. And yes, the outputs coincide. In eos81w, a probability of below 0.5 is considered highly permeable where as a probability of 0.5 or greater is considered low permeability. For example, If I run a prediction for the SMILE input
Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1
, the output I get is 0.903985857963562, which indicates that the molecule is not highly permeable.- But, for the SMILE input
CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
, the output is 0.0090377377346158, which indicates that the molecule is moderate to highly permable.- And when i try running the predictions for the same molecules on ADME NCATS model,
Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1
, the output I get is low permeability with Predicted Class (Probability) equal to 1 and forCCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
, the output is high permeability with Predicted Class (Probability) equal to 0
This is a great explanation thanks @masroor07 !
Hi @masroor07 Great that you were able to run both models! The ADME NCATS and the implementation at Ersilia are the same model, so in the eos81ew you are simply describing the dataset used to train the model, but not the actual model output? Can you now compare the predictions you got for the same molecules on both models and see if they make sense? Since they are the same model they should be coinciding
Hi @GemmaTuron,
- I tried running the predictions for various molecules on both the models. And yes, the outputs coincide. In eos81w, a probability of below 0.5 is considered highly permeable where as a probability of 0.5 or greater is considered low permeability. For example, If I run a prediction for the SMILE input
Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1
, the output I get is 0.903985857963562, which indicates that the molecule is not highly permeable.- But, for the SMILE input
CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
, the output is 0.0090377377346158, which indicates that the molecule is moderate to highly permable.- And when i try running the predictions for the same molecules on ADME NCATS model,
Nc1nc(NC2CC2)c2ncn([C@H]3C=C[C@@H](CO)C3)c2n1
, the output I get is low permeability with Predicted Class (Probability) equal to 1 and forCCCSc1ccc2nc(NC(=O)OC)[nH]c2c1
, the output is high permeability with Predicted Class (Probability) equal to 0This is a great explanation thanks @masroor07 !
Thank you for the review!
OptiMol: Optimization of binding affinities in chemical space for drug discovery
An optimization pipeline that leverages complementary structure-based and ligand-based methods. The model introduces new Graph to SELFIES VAE. The model iteratively selects promising compounds in the chemical space using a ligand-centered generative model and then performs the molecular docking to guide compound optimization.
optimol
https://pubs.acs.org/doi/10.1021/acs.jcim.0c00833
https://github.com/jacquesboitreaud/OptiMol
OptiMol model is implemented in Pytorch
and DGL
. The model was trained for 50 epochs using Adam optimizer. It proposes the optimization of the current state of art methods that do not leverage the structure of a target. A new VAE that is more computationally efficient while retaining state of the art results is introduced. Instead of performing docking on a fixed drug bank, promising compounds in the whole chemical space using a ligand-centered generative model are selected and molecular docking is then used as an oracle to guide compound optimization, allowing iterative generation of leads that betters fit the target structure. This oracle is costly and Bayesian optimization with recently published method : Conditioning by Adaptive Sampling was used to optimize the whole approach.
To generate compounds with high binding affinities, we could use one of the three binding affinity estimates:
None
Hi @masroor07 !
Thanks for the detailed explanation, this model looks very relevant to some projects we are working at in Ersilia! LEt me give you a few extra pointers:
Hi @masroor07 !
Thanks for the detailed explanation, this model looks very relevant to some projects we are working at in Ersilia! LEt me give you a few extra pointers:
- It is best to always cite the peer reviewed publication instead of the preprint, if available. In this case, it is published in Journal of Chemical Information and Modeling
- The code link should bring us to the GitHub link, but it is the biorxiv still
- Some extra information from the code, for example, which license it uses, would be good to have
Thank you for the positive review and the extra pointers.
Hi @masroor07 !
While you look for further models, can I ask you to include this model suggestion in our list?
Thanks!
MolGAN: An implicit generative model for small molecular graphs
A free generative model that that provides a way around the expensive graph matching procedures. The model adapts generative adversarial networks which is backed by RL to generate chemical molecules with desired properties.
molgan
https://arxiv.org/abs/1805.11973 (PDF)
https://github.com/nicola-decao/MolGAN
MolGAN is an implicit generative model for molecular graphs of small size which can be jointly trained with a GAN and a RL objective to generate molecular graphs with higher validity and novelty. It can achieve better chemical property scores and also removes the additional overhead when compared to the SMILES (which are generated from a graph based representation of molecules) based sequential GAN model for molecular generation. The model consists of three main components: generator, discriminator and a reward network.
MIT
Hi @masroor07 !
Thanks, good model and very detailed description, I appreciate it! My only concern is that is a bit old (runs in Py3.6) let's hope it can be easily bumped to py 3.10!) Can you add it to the model list while you look for a third model suggestion?
Hi @masroor07 !
Thanks, good model and very detailed description, I appreciate it! My only concern is that is a bit old (runs in Py3.6) let's hope it can be easily bumped to py 3.10!) Can you add it to the model list while you look for a third model suggestion?
@GemmaTuron, Thank you for the positives! I did notice the python version but I guess there must be a way to make the required changes/updates to the code. Yes, sure! I will add it to the model list.
hile you look @GemmaTuron To add the model to the list, I had to fill the form right?
yes please: https://airtable.com/shroQLlkcmDcC0xzm
yes please: https://airtable.com/shroQLlkcmDcC0xzm
I filled the form! Thank you
EpitopeVec: Linear Epitope Prediction Using Deep Protein Sequence Embeddings
EpitopeVec is model to predict the linear-B Peptides. It uses a combination of residue properties, modified antigenicity scales, and protein language model-based representations (protein vectors) as features of peptides for linear BCE predictions.
epitopevec
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8652027/
https://github.com/hzi-bifo/epitope-prediction
The region of an antigen recognized by antibodies is known as an epitope and if it is a continuous stretch of amino acids, it is a linear epitope. The identification of BCEs is important in many applications. EpitopeVec predicts the linear B-cell epitopes. It is a tool that combines commonly used propensity scales, residue features and modified antigenicity scales for vector representation of the peptides. It is based on a SVM model trained on a large set of experimentally verified epitopes and makes use of different amino acid features.
Datasets: EpitopeVec is trained on various small and large datasets derived from Bcipep and Immune Epitope Database (IEDB)
GPLv3
Hi @GemmaTuron, I was tring to reinstall NCATS-ADME on my system. I am facing a pickling issue i.e. `_pickle.UnpicklingError: invalid load key, '<'.'.
Hi @GemmaTuron, I was tring to reinstall NCATS-ADME on my system. I am facing a pickling issue i.e. `_pickle.UnpicklingError: invalid load key, '<'.'.
I was able to understand why a number of people face the pickling issue. The reason is simple: They had cloned the master branch of NCATS-ADME, which has outdated links and also, the code is not up to date. I cloned the "development" branch and that solved my issue. I just had to reinstall flask to solve the jinja2
issue.
I tried running the NCATS-HLM model. The model doesn't accept the input. I tried passing a csv to it. I also tried passing a text file to it but it doesn't process either of the input formats. The error message: There was an error processing your file. Please make sure you have selected a file that contains SMILES, indicate if the file contains a header and the column number containing the SMILES. Input csv file: 10_SMILES.csv I tried passing the same input file to PAMPA5 and and PAMPA7.4 as well. And they both were able to process the input file.
Update:
I tried running the NCATS-HLM model. The model doesn't accept the input. I tried passing a csv to it. I also tried passing a text file to it but it doesn't process either of the input formats. The error message: There was an error processing your file. Please make sure you have selected a file that contains SMILES, indicate if the file contains a header and the column number containing the SMILES. Input csv file: 10_SMILES.csv I tried passing the same input file to PAMPA5 and and PAMPA7.4 as well. And they both were able to process the input file.
I tried running the deployed HLM model which processes the input file. Here is the outputfile of the predictions ran using it: ADME_Predictions_2023-03-26-183023.csv
UPDATE:
PAMPA50
as the model rather it shows PAMPA as the model. We can also observe a more precise probability score than PAMPA5 in the later version. I tried running the model on my local system and using the deployed version HERE as well.Note: The authors should change the model output to PAMPA 7.4. That would make it more clear to understand the model that we are using.(Right?)
I ran tests for the first 10 smiles from the EML. INPUT: 10_SMILES.csv
OUTPUT:
Local test: ADME_Predictions_2023-03-26-212850.csv
Deployed model: ADME_Predictions_2023-03-26-211505 (WEB).csv
Hi @masroor07 !
That's great, so if I understand it correctly:
If you can confirm that, I think we could reopen this issue and unarchive the model repo so that we can push the HLM model to the Hub as well. What do you think?
Hi @masroor07 !
That's great, so if I understand it correctly:
- HLM model works
- PAMPA5 works, with the results column stating PAMPA (only)
- PAMPA7.4 works, with the results column stating? (sorry missed that one!)
If you can confirm that, I think we could reopen this issue and unarchive the model repo so that we can push the HLM model to the Hub as well. What do you think?
The HLM model that has been deployed to opendata works, yes.
I've reopened the issues to tackle them:
Submitted my final application Hello @GemmaTuron, Thank you for all the help during the contribution period! Got to learn a ton of things. A thank you to other contributors as well for all the help during the contribution phase. Look forward to keep contributing to Ersilia!
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application