Closed Isaakkamau closed 1 year ago
Issue: Install the Ersilia Model Hub and test the simplest model
OSError: symbolic link privilege not held
Solution:
I have solved the error by running conda
as the administrator
Issue 2: ersilia fetch retrosynthetic-accessibility
(ersilia) C:\Windows\System32\ersilia>ersilia fetch retrosynthetic-accessibility
⬇️ Fetching model eos2r5a: retrosynthetic-accessibility
Checking setup: 3.731s
12%|████████████████████▏ | 1/8 [00:03<00:26, 3.73s/it]�🚨🚨 Something went wrong with Ersilia 🚨🚨🚨
Error message:
Ersilia exception class:
ModelDeleteError
Detailed error:
Error occured while deleting model eos2r5a
Hints:
Check that the model is actually installed in your local device:
$ ersilia serve eos2r5a
If this error message is not helpful, open an issue at:
- https://github.com/ersilia-os/ersilia
Or feel free to reach out to us at:
- hello[at]ersilia.io
If you haven't, try to run your command in verbose mode (-v in the CLI)
- You will find the console log file in: C:\Users\Isaac\eos/current.log
12%|████████████████████▏ | 1/8 [00:18<02:06, 18.06s/it]
Solution: I am yet to solve this issue, Any suggestions are welcome!
I am using Windows 10, and I am doing installation using conda
(run as administrator)
Hello, Everything is now fine when I changed my OS from windows to Ubuntu but I would also love to know how to solve the above error
I have successfully done the prediction using the command line!
Now I would like to do my prediction on my web browser using the Predict API, @GemmaTuron How can I change my input from .csv to a .json file format that model eos2r5a can understand?
Hi @Isaakkamau,
Welcome to Ersilia :)
We do not support development on Windows, only Linux and MacOS so that is why you were unable to run it, if you have a windows machine please use a Windows Subsystem for Linux.
As for the online API, we seldom use it, but good that you want to test it. It should probably be something like:
{"smiles": "CCCNOCCC"}
Hello, @GemmaTuron thanks the online predict API has also worked for me, If anybody else wants to try them you can recommend them to me!
Does Ersilia also support Docker Model deployment?
Hi @Isaakkamau
We are actually setting up the infrastructure to move all models to Docker containers, still work in progress, see issue #546 where Miquel and I are working
Noted! @GemmaTuron I am much familiar with model deployment using docker and FastAPI's, If any help is needed please let me know
Hello Ersilia,
My name is Isaak Kamau from Nairobi Kenya, currently graduated from the University of Nairobi with a degree in Mathematics (Statistics) and a couple of tech stacks like Tensorflow, Pytorch, Docker, and FastAPI. As someone who comes from an underrepresented background in the tech industry, I am impressed by Ersilia's efforts to create a diverse and inclusive workplace. I am excited by the prospect of working in an environment where my unique perspective and experiences will be valued and leveraged to drive innovation and progress.
Moreover, as an Outreachy participant, I chose to join Ersilia because I believe I have the skills required to contribute to their projects also Ersilia's mission and vision of bridging the gap between developing and developed countries in medical research is such a noble mission that I would love to participate and help to create even a more diverse and inclusive tech industry.
Thank you Ersilia, hoping to give my best.
Hello, @GemmaTuron I now want to move to week 2 contributions, Should I assign myself any model from the Ersilia model hub or do you have a specific one that I should try?
Hello @GemmaTuron I have started my week two contribution I have decided to start with maip-malaria-surrogate
since I think Malaria is still one of disease that is really affecting us here in Africa
Here is the output I am getting:
(ersilia) isaakmwangi@DESKTOP-O9Q8PKD:~$ cd ersilia
(ersilia) isaakmwangi@DESKTOP-O9Q8PKD:~/ersilia$ ersilia fetch maip-malaria-surrogate
⬇️ Fetching model eos2gth: maip-malaria-surrogate
Checking setup: 1.010s
Preparing model: 6.107689142227173s
Getting model: 14.707326173782349s
Packing model: 323.38919615745544s
Checking if model needs to be integrated to a tool: 0.0036895275115966797s
Getting model card: 1.2480900287628174s
Checking that autoservice works: 8.294064044952393s
Sniffing model: 31.705535411834717s
100%|█████████████████████████████████████████████████████████████████████████████████████| 8/8 [06:30<00:00, 48.86s/it]
Fetching eos2gth done in time: 0:06:30.863719s
👍 Model eos2gth fetched successfully!
(ersilia) isaakmwangi@DESKTOP-O9Q8PKD:~/ersilia$ ersilia serve maip-malaria-surrogate
🚀 Serving model eos2gth: maip-malaria-surrogate
URL: http://127.0.0.1:44521
PID: 609
SRV: conda
👉 Available APIs:
- predict
💁 Information:
- info
(ersilia) isaakmwangi@DESKTOP-O9Q8PKD:~/ersilia$ ersilia api -i 'CCCOCCC'
{
"input": {
"key": "POLCUAVZOMRGSN-UHFFFAOYSA-N",
"input": "CCCOCCC",
"text": "CCCOCCC"
},
"output": {
"score": 4.5375906733581415
}
}
(ersilia) isaakmwangi@DESKTOP-O9Q8PKD:~/ersilia$
Now I am testing Ersilia maip-malaria-surrogate
with a CSV file that has two columns with headers. You can get the dataset here:
https://chembl.gitbook.io/malaria-project/input-data-file
Now specifying Maip to give output as a .csv file extension:
ersilia api predict -i MAIP_example.csv -o Ersilia_MAIP_Prediction.csv
Here is the .csv output file
Now I want to repeat the prediction using the online API offered here: https://www.ebi.ac.uk/chembl/maip/
@GemmaTuron Thanks for the clarity, Let me select another Model from the proposed model list
For future reference, MAIP-Malaria-Surrogate is not part of the contribution period. Thanks for the update @Isaakkamau
Could you please update the issue with the model you have picked up now and the issues you are facing with it? Remember to post the complete stack trace, as well as your understand of it! Thanks.
Hello @DhanshreeA Noted Thanks!
Hello @GemmaTuron and @DhanshreeA After carefully going through the Four proposed Ersilia models for week two contributions I have really been interested in STOUT and SARS-CoV2 activity Models.
I would love to do both of them starting with STOUT.
The reason behind STOUT is being an aspirant Machine Learning Engineer most of my projects, I have mostly dealt with computer vision models like CNN but now in this project, I will have a chance to explore more algorithms like language translation and language understanding using Neural Machine Translation (NMT).
Moreover, I would also love to do a SARS-CoV2 activity project because I recently did a similar project at Udacity where I was also supposed to make a Machine Learning Application that users can interact with it by training their models, setting their preferred hyperparameters and making prediction using command line arguments and I have seen also in SARS-CoV2 activity it also requires some knowledge on using command line applications. Here is the project I did: https://github.com/Isaakkamau/Udacity-Create-Your-Own-Image-Classifier
When Installing STOUT I got this error
FileExistsError: [Errno 17] JVM DLL not found: Define/path/or/set/JAVA_HOME/variable/properly
The error was due Java Virtual Machine (JVM) dynamic-link library (DLL) cannot be found. This error usually occurs when the system is unable to locate the JVM, which is required for running Java programs.
To solve the error first, make sure that Java is installed on your system. You can do this by running the java -version
command in your terminal or command prompt.
If it's not installed like in my case, I used the following commands to solve the problem:
sudo apt update
sudo apt install default-jdk
After the installation is complete, you can verify that Java is installed correctly by running the following command:
java -version
The above steps were able to solve my error
Hi @Isaakkamau
Thanks for sharing your previous work! If the STOUT model is now installed and working in your system, please complete week 2 tasks and move onto week 3! Let's make sure these are tackled before looking into the SARS-CoV models. You can also check adedeji's Git Issue for more info about the Stout model testing!
Hello, @GemmaTuron
I have successfully installed the STOUT model. About Run predictions for the EML
I have chosen rather than predicting the entire EML dataset I have made a simple python script that can give our users an option to predict the only SMILES
they are interested in.
Here is my script for making predictions:
from STOUT import translate_forward, translate_reverse
# SMILES to IUPAC name translation
SMILES = ["Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1", "C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5", "CC(=O)Nc1sc(nn1)[S](N)(=O)=O"]
for SMILE in SMILES:
IUPAC_name = translate_forward(SMILE)
print("IUPAC name of "+SMILE+" is: "+IUPAC_name)
The user can pass any number of SMILES
to the above SMILES List
for SMILES to IUPAC name translation.
Here is my prediction for the 3 SMILES that I have obtained from the EML:
IUPAC name of Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1 is: [(1S,4R)-4-[2-amino-6-(cyclopropylamino)purin-9-yl]cyclopent-2-en-1-yl]methanol
IUPAC name of C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5 is: (3S,8R,9S,10R,13S,14S)-10,13-dimethyl-17-pyridin-3-yl-2,3,4,7,8,9,11,12,14,15-decahydro-1H-cyclopenta[a]phenanthren-3-ol
IUPAC name of CC(=O)Nc1sc(nn1)[S](N)(=O)=O is: N-(5-sulfamoyl-1,3,4-thiadiazol-2-yl)acetamide
I now want to move on to the last part of week 2 Compare results with the Ersilia Model Hub implementation!
Please let me know if the results are satisfying.
Hello @GemmaTuron
Here are my week 2 Compare results with the Ersilia Model Hub implementation!
results:
I have run the STOUT
Model from the Ersilia Model Hub using the above three examples I had used with the original code.
Here are the predictions for:
1. Nc1nc(NC2CC2)c3ncn([C@@H]4CC@HC=C4)c3n1
ersilia) hl@hl-laptop:~/ersilia$ ersilia api -i 'Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1'
{
"input": {
"key": "MCGSCOLBFJQGHM-SCZZXKLOSA-N",
"input": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1",
"text": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1"
},
"output": {
"outcome": [
"[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol"
]
}
}
(ersilia) hl@hl-laptop:~/ersilia$
2. C[C@]12CCC@HCC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5
(ersilia) hl@hl-laptop:~/ersilia$ ersilia api -i 'C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5'
{
"input": {
"key": "GZOSMCIZMLWJML-VJLLXTKPSA-N",
"input": "C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5",
"text": "C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5"
},
"output": {
"outcome": [
"(1S,2S,5S,10R,11R,14S)-5,11-dimethyl-5-pyridin-3-yltetracyclo[9.4.0.02,6.010,14]pentadeca-7,16-dien-14-ol"
]
}
}
(ersilia) hl@hl-laptop:~/ersilia$
3. CC(=O)Nc1sc(nn1)S(=O)=O
(ersilia) hl@hl-laptop:~/ersilia$ ersilia api -i 'CC(=O)Nc1sc(nn1)[S](N)(=O)=O'
{
"input": {
"key": "BZKPWHYZMXOIDC-UHFFFAOYSA-N",
"input": "CC(=O)Nc1sc(nn1)[S](N)(=O)=O",
"text": "CC(=O)Nc1sc(nn1)[S](N)(=O)=O"
},
"output": {
"outcome": [
"N-[5-[amino(dioxo)-\u03bb6-thia-3,4-diazacyclopent-2-en-2-yl]acetamide"
]
}
}
SMILES | Original Code Prediction | Ersilia Predictions |
---|---|---|
SMILE 1 | [(1S,4R)-4-[2-amino-6-(cyclopropylamino)purin-9-yl]cyclopent-2-en-1-yl]methanol |
[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol |
SMILE 2 | (3S,8R,9S,10R,13S,14S)-10,13-dimethyl-17-pyridin-3-yl-2,3,4,7,8,9,11,12,14,15-decahydro-1H-cyclopenta[a]phenanthren-3-ol |
(1S,2S,5S,10R,11R,14S)-5,11-dimethyl-5-pyridin-3-yltetracyclo[9.4.0.02,6.010,14]pentadeca-7,16-dien-14-ol |
SMILE 3 | N-(5-sulfamoyl-1,3,4-thiadiazol-2-yl)acetamide |
N-[5-[amino(dioxo)-\u03bb6-thia-3,4-diazacyclopent-2-en-2-yl]acetamide |
There are some discrepancies between the two SMILE 1
Smiles to IUPAC predictions. From my research, the two IUPAC names appear to be of the same compound but have some differences in the stereochemistry of the cyclopentene ring.
I have also tried to explore and run more predictions on the SMILES using different sources like: https://app.syntelly.com/smiles2iupac and from all other predictions I have run on the SMILES the original code prediction
appears to give the most probable answer.
We can not draw our final conclusion about the models' accuracy based only on the 3 examples that I have been using. The above is just a demonstration of some of the processes we can use.
@GemmaTuron That's it for week 2, Feel free to let me know if you have any questions or comments about my week 2 contribution.
Best regards Isaak
Hi @Isaakkamau !
Thanks for the work, very well documented! Let's tackle week 3 tasks then!
@GemmaTuron On it! and thank you for the kind comment
Malformer
Large-Scale Chemical Language Representations Capture Molecular Structure and Properties
https://github.com/IBM/molformer
The above Paper discusses the use of machine learning models to accurately and quickly predict molecular properties in drug discovery and material design. However, the vast chemical space and limited availability of property labels make supervised learning challenging. To address this, the authors present MoLFormer.
The MOLFORMER's design is based to learn about a model trained on a small molecules which are represented as SMILES string. The Model architecture has an efficient linear attention mechanism and relative positional embeddings with the goal of learning a meaningful and compressed representation of chemical molecules.
Apache
Good suggestion @Isaakkamau !
Can you add it to our model suggestion list? thanks!
Welcome @GemmaTuron Should I add it to Ersilia's suggestion list using this Form or I open a new model request issue?
Hello @GemmaTuron I have added it here: https://github.com/ersilia-os/ersilia/issues/658, Please have a look if it's okay that way
Welcome @GemmaTuron Should I add it to Ersilia's suggestion list using this Form or I open a new model request issue?
Hi, I think she meant the form, not opening a model request issue.
controlled-peptide-generation
Peptide autoencoder
Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics
https://www.nature.com/articles/s41551-021-00689-x
https://github.com/IBM/controlled-peptide-generation
The model uses deep learning classifiers trained on an informative latent space of molecules modeled using deep generative autoencoders to present an efficient computational method for the generation of antimicrobials with desired attributes
The authors of the "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics" paper have used deep generative autoencoder models together with deep learning classifiers to create a computational method identified as Controlled Latent attribute Space Sampling (CLaSS). The CLaSS method has then been used for designing non-toxic antimicrobial peptides (AMPs). The above method successfully generated 20 AMPs. The paper concludes by suggesting that the method can be used to accelerate the discovery of potent and selective broad-spectrum antimicrobials
The above repo is using short versions of data files that are required by the data curation code at data_processing/data
dir
Apache
Welcome @GemmaTuron Should I add it to Ersilia's suggestion list using this Form or I open a new model request issue?
Hi, I think she meant the form, not opening a model request issue.
Thank you @Zainab-ik I have submitted it
Hi @Isaakkamau !
Indeed, I meant to the list, I've closed the model request issue. Could you please provide a bit more of information on model 2?
Thanks!
Hello @GemmaTuron
Sure, I have added a summary of Model 2. Please check it out.
Thanks
Graphormer
Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets
https://graphormer.readthedocs.io/en/latest/
https://www.microsoft.com/en-us/research/project/graphormer/
Graphormer is a deep-learning python package for training custom models for molecule modeling tasks. Graphomer model architecture and adaptation have been modified to 3D molecular daynamic simulation which allows the model to perform well on 2D and 3D molecular graph modeling datasets. Researchers and developers can use it as a catalyst in researching and applying AI for molecule science such as drug discovery. Graphormer provides example scripts to train your own model on several datasets using a command-line interface. It also provides pre-trained models that researchers can easily evaluate and fine-tune.
https://github.com/microsoft/Graphormer
MIT License
Hello @GemmaTuron I also found these other models:
LiGAN
deep generative models for structure-based drug discovery (a python package, but it also depends on C++/CUDA.) LiGANMegaMolBART
is a deep learning model for small molecule drug discovery and cheminformatics based on SMILES. MegaMolBARTLIMO
Latent Inceptionism for Targeted Molecule (with desired properties) Generation LIMOIf they might be interesting to Ersilia I can also add them to the suggestion list!
Thanks
Hi @Isaakkamau !
Thanks for these suggestions! Can you please add Graphormer to the ersilia model suggestion list? I really like LiGAN and LIMO, we cannot add them natively to the Hub because they deal with protein structures, but will keep that in mind. For the MegaMolBART, we already have other language models pretrained to be used in model training - we actually rely on MolBART which is the basis of MEgaMolBart.
Once you have added the models in the list, please focus on writing up your final application
Hi, @GemmaTuron
Noted!
I have added the Graphormer
to the suggestion list.
Now starting the final application
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application