Closed PromiseFru closed 10 months ago
I've set up Ersilia and confirmed its recognition in the CLI by using ersilia --help
, which displayed available command options. I've also successfully fetched and served the eos3b5e
model. However, I'm facing an error when trying to calculate the molecular weight using the ersilia -v api calculate -i "CCCC"
command.
The error message I received is:
KeyError: 'calculate'
I suspect the problem may be related to the command not recognizing "calculate" as a valid API name in the schema. I've made sure that I've installed all the necessary packages, especially Git-LFS, as it's a common mistake to overlook.
I would appreciate any help or guidance on resolving this issue.
I expect that running the ersilia -v api calculate -i "CCCC"
command should calculate the molecular weight of the input molecule ("CCCC") and display the result in the CLI as follows:
{
"input": {
"key": "IJDNQMDRQITEOD-UHFFFAOYSA-N",
"input": "CCCC",
"text": "CCCC"
},
"output": {
"mw": 58.123999999999995
}
}
When I run the ersilia -v api calculate -i "CCCC"
command, it results in a KeyError with the message: KeyError: 'calculate'
.
ersilia --help
.eos3b5e
model with the following commands:
ersilia -v fetch eos3b5e
ersilia serve eos3b5e
ersilia -v api calculate -i "CCCC"
.Please let me know if you need any additional information to help resolve this issue.
Hi, Check this out. I made a bit of progress here
ersilia -v api calculate -i "CCCC"
To make predictions, ersilia uses a standard API run
.
So, instead of ersilia -v api calculate -i "CCCC"
, use ersilia -v api run -i "CCCC"
ersilia -v api calculate -i "CCCC"
To make predictions, ersilia uses a standard API
run
. So, instead ofersilia -v api calculate -i "CCCC"
, useersilia -v api run -i "CCCC"
Hello @HellenNamulinda ,
Thank you for your reply. I ran the command ersilia -v api run -i "CCCC"
and I received a new error message:
"TypeError: object of type 'NoneType' has no len()."
I've attached the log file for your review. log_output.txt
HI, i fixed that by changing
Opening the code at /home/leila/ersilia/ersilia/io/readers/file.py, In the read_input_columns function at line 321.
I changed the code from
if len(h) == 1:
to
if h is not None and len(h) == 1:
HI, i fixed that by changing Opening the code at /home/leila/ersilia/ersilia/io/readers/file.py, In the read_input_columns function at line 321. I changed the code from
if len(h) == 1:
toif h is not None and len(h) == 1:
Thank you @leilayesufu. It works now ๐
Hello @samuelmaina @leilayesufu , I went ahead and opened a pull request (PR) to fix the issue mentioned above. This will make it easier for other contributors as they won't need to manually edit the files.
You can check out the PR here: https://github.com/ersilia-os/ersilia/pull/827
@PromiseFru thanks for your efforts, you generally don't need to modify the Ersilia code at this point and therefore I have closed #827.
I have tried running the molecular weight model in an ersilia environment with Python 3.7 environment and Conda versions 23.5.2 and I do not run into this issue. Could you try reinstalling ersilia and running this again?
I joined Outreachy because I heard it is an organization that provides opportunities for underrepresented individuals in the technical industry. Being a part of this group in my society, I found it comforting to discover a community that could provide assistance. My decision to join Ersilia was influenced by the fact that the required skill sets for the project align with my current competencies, such as Python, Conda, and Docker. While I haven't had extensive experience with Conda, I have worked on projects that primarily utilized pip. I saw this as a valuable opportunity to learn Conda and apply this knowledge in practical situations.
Additionally, I have a strong interest in the field of AI/ML and have made several attempts to teach myself the basics. However, I have found it challenging to gain a solid understanding of its practical applications. I believe that collaborating with the Ersilia community will provide me with an excellent opportunity to learn and grow in this subject. I am particularly drawn to Ersilia's mission and goals, as it focuses on providing biomedical AI/ML solutions to scientists worldwide. This resonates with me because my society does not prioritize medical technology solutions. If given the chance to learn and advance in this field, I hope to join the Ersilia community and contribute to medical solutions that can positively impact my society.
During the internship, my intention is to learn, contribute, and collaborate as much as possible within the Ersilia community. I will use this opportunity to learn from the experts in the Ersilia community to build a strong foundation in the AI/ML field so that I can give back to the community effectively. After the internship, I plan to leverage the knowledge, experience, and skills I have gained to create potential solutions for my society, all while continuing to contribute to the Ersilia community. I aspire to build a career in AI/ML, and I firmly believe that growth can only be achieved with the guidance of experts and the support of a community dedicated to personal and collective development. I see the Ersilia community as the ideal place for this growth to occur.
The first week of my contribution period at Ersilia Model Hub was fantastic. It mainly involved getting acquainted with the community, meeting my fellow contributors, understanding Ersilia's mission, learning about community collaboration, and setting up the necessary tools and environment to run Ersilia's codebase.
ersilia-outreachy-w23
Slack channel using the provided invitation link.#general channel
, I introduced myself and greeted the wonderful community.I'm excited about the Community call scheduled for Friday, October 6th, at 5:00 pm CET. I'm ready to assist any contributor who needs help with their Week 1 tasks. I'm also looking forward to Week 2 tasks. Should I start Week 2 tasks now, or should I wait until Week 2 officially begins, @DhanshreeA ?
Hi @PromiseFru thanks for the detailed updated. I see that you have completed all the tasks for Week 1, please go ahead and get started with Week 2 tasks, you do not need to wait. :)
@DhanshreeA has mentioned on the slack channel that we can go ahead with week 2. Good luck with that
ImageMol
Model? ๐คI chose this model because I'm deeply intrigued by its potential to address a critical issue in my community. Drug abuse is unfortunately prevalent, and it's alarming to witness even pharmacies contributing to this problem by providing incorrect dosages or dispensing medications without prescriptions solely for profit. The model's ability to predict molecular targets and evaluate drug properties could play a significant role in ensuring the safe and effective use of medicines. I'm eager to contribute to its implementation and see how it can make a positive impact on the healthcare landscape in my society.
Visit the model's Github repository here
๐ ๏ธ Install CUDA 10.1
sudo apt install nvidia-cuda-toolkit
๐งช Test installation
nvcc --version
----
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
โ Success
imagemol
conda environmentconda create -n imagemol python=3.7.3
conda activate imagemol
โ Success
๐ ๏ธ Install rdkit
conda install -c rdkit rdkit
โ Success
๐ ๏ธ Install torch
pip install https://download.pytorch.org/whl/cu101/torch-1.4.0-cp37-cp37m-linux_x86_64.whl
โ Failed to install torch
๐ Error logs
Collecting torch==1.4.0
ERROR: HTTP error 403 while getting https://download.pytorch.org/whl/cu101/torch-1.4.0-cp37-cp37m-linux_x86_64.whl
ERROR: Could not install requirement torch==1.4.0 from https://download.pytorch.org/whl/cu101/torch-1.4.0-cp37-cp37m-linux_x86_64.whl because of HTTP error 403 Client Error: Forbidden for url: https://download.pytorch.org/whl/cu101/torch-1.4.0-cp37-cp37m-linux_x86_64.whl for URL https://download.pytorch.org/whl/cu101/torch-1.4.0-cp37-cp37m-linux_x86_64.whl
๐จ Possible workaround
pip install torch==1.4.0
----
Successfully installed torch-1.4.0
โ Success
๐ ๏ธ Install torchvision
pip install https://download.pytorch.org/whl/cu101/torchvision-0.5.0-cp37-cp37m-linux_x86_64.whl
โ Failed to install torchvision
๐ Error logs
Collecting torchvision==0.5.0
ERROR: HTTP error 403 while getting https://download.pytorch.org/whl/cu101/torchvision-0.5.0-cp37-cp37m-linux_x86_64.whl
ERROR: Could not install requirement torchvision==0.5.0 from https://download.pytorch.org/whl/cu101/torchvision-0.5.0-cp37-cp37m-linux_x86_64.whl because of HTTP error 403 Client Error: Forbidden for url: https://download.pytorch.org/whl/cu101/torchvision-0.5.0-cp37-cp37m-linux_x86_64.whl for URL https://download.pytorch.org/whl/cu101/torchvision-0.5.0-cp37-cp37m-linux_x86_64.whl
๐จ Possible workaround
pip install torchvision==0.5.0
----
Successfully installed torchvision-0.5.0
โ Success
๐ ๏ธ Install torch-cluster
torch-scatter
torch-sparse
torch-spline-conv
pip install torch-cluster torch-scatter torch-sparse torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.4.0%2Bcu101.html
โ Failed to install torch-cluster
torch-scatter
torch-sparse
torch-spline-conv
๐ Error logs: output_log.txt
๐จ Possible workaround
๐ ๏ธ Clone and Navigate to Repository
git clone git@github.com:HongxinXiang/ImageMol.git
cd ImageMol
You can find the toy dataset in ./datasets/toy/pretraining/
๐ ๏ธ Pre-train ImageMol using a single GPU on toy dataset
python pretrain.py --ckpt_dir ./ckpts/pretraining-toy/ \
--checkpoints 1 \
--Jigsaw_lambda 1 \
--cluster_lambda 1 \
--constractive_lambda 1 \
--matcher_lambda 1 \
--is_recover_training 1 \
--batch 16 \
--dataroot ./datasets/toy/pretraining/ \
--dataset data \
--gpu 0 \
--ngpu 1
โ Failed to pre-train ImageMol
๐ Error logs: output_log2.txt
๐จ Possible workaround
For torch-cluster and the rest of them. Try installing them one by one. As in pip install torch-cluster, pip install torch-sparse
This link would help you https://pytorch-geometric.readthedocs.io/en/1.3.2/notes/installation.html
Adding CUDA to path and whatnot
Hello @Richiio
Thank you so much for the assistance.
I've tried installing the packages as you've suggested, and it turns out torch-spline-conv
installed with no issues, but torch-cluster
, torch-scatter
, and torch-sparse
failed with similar errors as in the log file above. I can provide their individual log files if needed. However, a repeated pattern of all the errors contained the following information:
error: subprocess-exited-with-error
ร python setup.py bdist_wheel did not run successfully.
โ exit code: 1
โฐโ> [161 lines of output]
No CUDA runtime is found, using CUDA_HOME='/usr'
running bdist_wheel
running build
running build_py
creating build
...
This issue appears to be a hardware limitation of my computer. I further confirmed this by visiting the resource you shared and using the guide provided to Check if PyTorch is installed with CUDA support.
The command returned a False
on my computer.
python -c "import torch; print(torch.cuda.is_available())"
>>> False
I'm not sure if there is a workaround for this one, except changing my computer for one compatible with CUDA
. However, since I can't do that right now, I'll have to explore other models if they have environments that can run on my computer comfortably.
Hi, try using conda to install the packages
conda install pyg -c pyg
Try this command
You could remove the PyTorch you installed and install the CPU version. Just visit the PyTorch official page and select the CPU version. Your current GPU can't run the CUDA version, that's why you are getting an error with PyTorch and the other packages.
Hi, try using conda to install the packages
conda install pyg -c pyg
Try this command
Hello @leilayesufu
Thank you for this solution. It helped me make significant progress with my setup. I successfully installed torch-cluster
, torch-scatter
, and torch-sparse
, and they built without any issues. Now, I'm excited to move on to pretraining the model! ๐
You could remove the PyTorch you installed and install the CPU version. Just visit the PyTorch official page and select the CPU version. Your current GPU can't run the CUDA version, that's why you are getting an error with PyTorch and the other packages.
Hello @maureen-mugo
Thank you very much for your help. Running the CPU version of PyTorch didn't work because the ImageMol model appears to require a CUDA-enabled PyTorch. I will soon attach the logs to provide more details.
Hi @PromiseFru Many thanks for the detailed updates! I see that you have resolved these issues and are moving on to pre training however I am leaving some remarks here for other contributors, should they need it.
Installing CUDA but no GPUs detected : Yes this is very much possible. CUDA is a proprietary API for working with NVIDIA graphics cards. While you can install it in principle (just like any other software) on your system, it will not find the required hardware to work with. However, good news :rocket: Ersilia models are intended to be used with CPUs with the primary aim of making them usable for low resource settings. Therefore, it is in fact recommended that you work with this model (or any other model within Ersilia in the future) using CPU compatible code.
Issues with building torch derived packages As for the other torch dependencies, for some reason pip does not do a good job of figuring out which versions of torch-cluster, torch-spline-cov, torch-scatter etc. are compatible with the installed torch version. As @Richiio rightly mentioned, try installing them one by one. I would also recommend looking through the release notes of these libraries and checking which version is compatible with your torch version The release notes are always a great place for clearing up dependency conflicts. That helped me last time when I incorporated this model.
Good luck with experimenting with the model!
STOUT
ModelInitially, I wanted to work with the ImageMol
model, but due to hardware limitations, I've been unable to set it up and run the model. So, I began searching for another possible model that I'd like to work on, and the STOUT
model stood out. Similar to my motivation for choosing the ImageMol
model, after reading up about the STOUT model and its ability to save scientists and researchers a significant amount of time and effort when working with chemical compounds, as well as reducing the risk of errors in scientific research and medical applications through its deep-learning neural machine translation approach to generate the IUPAC name for a given molecule from its SMILES string, and vice versa, I was convinced that this model could be applied to creating medical tech solutions for my society and others. Additionally, this model is compatible with my computer's specifications. I'm genuinely curious to learn how this model is applied in practice.
Visit the model's Github repository here
STOUT
conda environmentconda create --name STOUT python=3.8
conda activate STOUT
โ Success
Using PyPi
pip install STOUT-pypi
โ Success
I used the predictor_demo.py in the STOUT repository to test the model locally.
python predictor_demo.py
IUPAC_predictions.txt SMILES_predictions.txt
โ Success
The STOUT model, a deep learning neural machine translation model for chemical compounds, can predict IUPAC (International Union of Pure and Applied Chemistry) names based on a given SMILES (Simplified Molecular Input Line Entry System) notation for a chemical compound. The model works in three steps:
import time
import csv
from STOUT import translate_forward
input_file_name = "eml_canonical.csv"
output_file_name = "eml_canonical_IUPAC_predictions.csv"
start = time.time()
def translate(ln_num: int, total_ln: int, smiles: str, field: str):
try:
prediction_start_time = time.time()
print("===================================================")
print(f"Line {ln_num}/{total_ln} Processing ({field}) ...")
IUPAC_name = translate_forward(smiles)
prediction_time = time.time() - prediction_start_time
print(f"SMILES name: {smiles}")
print(f"IUPAC name: {IUPAC_name}")
print(f"Time: {prediction_time:.4f} sec")
return IUPAC_name
except Exception as error:
print(f"Line {ln_num}/{total_ln} - Error: {str(error)}")
return f"Error: {str(error)}"
with open(input_file_name, "r", encoding="utf-8") as input_file, open(
output_file_name, "w", encoding="utf-8", newline=""
) as output_file:
total_lines = sum(1 for _ in input_file) - 1
input_file.seek(0)
csv_reader = csv.reader(input_file)
csv_writer = csv.writer(output_file)
next(csv_reader)
header = ["drugs", "iupac", "can_iupac"]
csv_writer.writerow(header)
for line_number, columns in enumerate(csv_reader, start=1):
columns[1] = translate(line_number, total_lines, columns[1], "smiles")
columns[2] = translate(line_number, total_lines, columns[2], "can_smiles")
csv_writer.writerow(columns)
elapsed_time = time.time() - start
print(f"\nTotal time taken for all predictions: {elapsed_time:.4f} seconds")
smiles
(SMILES Notation) and can_smiles
(Canonical SMILES) columns from the dataset for each drug as input to the STOUT model. translate_forward
method to predict IUPAC names, which are written to the iupac
and can_iupac
columns alongside their respective drug values in the output file. The prediction process took a total of 68,387.5641 seconds.
I accessed the STOUT model on GitHub, which I found on the Ersilia Model Hub.
EOS model ID: eos4se9 Slug: smiles2iupac
Fetch the model from the remote repository using the Ersilia identifier eos4se9
ersilia fetch eos4se9
โ Success
Serve the model
ersilia serve eos4se9
โ Success
Check how the smiles2iupac
model is running
docker ps --format '{{.ID}}\t{{.Image}}' | grep 'eos4se9'
-----
0c92ffcad288 ersiliaos/eos4se9:latest
The
smiles2iupac
model is running from a docker container
โ Success
Get information about the model
ersilia info
-----
๐ STOUT: SMILES to IUPAC name translator
Small molecules are represented by a variety of machine-readable strings (SMILES, InChi, SMARTS, among others). On the contrary, IUPAC (International Union of Pure and Applied Chemistry) names are devised for human readers. The authors trained a language translator model treating the SMILES and IUPAC as two different languages. 81 million SMILES were downloaded from PubChem and converted to SELFIES for model training. The corresponding IUPAC names for the 81 million SMILES were obtained with ChemAxon molconvert software.
๐ Identifiers
Model identifiers: eos4se9
Slug: smiles2iupac
๐ค Code and parameters
GitHub: https://github.com/ersilia-os/eos4se9
AWS S3: https://ersilia-models-zipped.s3.eu-central-1.amazonaws.com/eos4se9.zip
๐ Docker
Docker Hub: https://hub.docker.com/r/ersiliaos/eos4se9
Architectures: AMD64
For more information, please visit https://ersilia.io/model-hub
โ Success
Run prediction
ersilia -v api run -i eml_canonical.csv -o ersilia_eml_canonical_IUPAC_predictions.csv
After executing the prediction command for over 6 hours
, I finally received an exit return on the terminal:
| DEBUG | Status code: 504
| ERROR | Status Code: 504
| DEBUG | Status code: 504
| ERROR | Status Code: 504
| DEBUG | Status code: 504
| ERROR | Status Code: 504
| DEBUG | Status code: 504
| ERROR | Status Code: 504
| DEBUG | Schema available in /home/eos/dest/eos4se9/api_schema.json
| DEBUG | Done with unique posting
| DEBUG | Data: outcome
| DEBUG | Values: [None]
| DEBUG | Datatype: string_array
ersilia_eml_canonical_IUPAC_predictions.csv
When I examined the output file, it appeared as follows:
key,input,iupacs_names
MCGSCOLBFJQGHM-SCZZXKLOSA-N,Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1,
GZOSMCIZMLWJML-VJLLXTKPSA-N,C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5,
BZKPWHYZMXOIDC-UHFFFAOYSA-N,CC(=O)Nc1sc(nn1)[S](N)(=O)=O,
QTBSBXVTEAMEQO-UHFFFAOYSA-N,CC(O)=O,
PWKSKIMOESPYIA-BYPYZUCNSA-N,CC(=O)N[C@@H](CS)C(O)=O,
BSYNRYMUTXBXSQ-UHFFFAOYSA-N,CC(=O)Oc1ccccc1C(O)=O,
(more data ...)
The iupacs_names
column was empty, but my CPUs were still consistently at 90%+ usage, so I assumed the model was still running and would update the file with the IUPAC names.
I attempted to locate any logs from the Docker container using the following command:
docker logs --follow eos4se9_d537
-----
+ [ -z eos4se9 ]
+ ersilia serve -p 3000 eos4se9
๐ Serving model eos4se9: smiles2iupac
URL: http://127.0.0.1:3000
PID: 37
SRV: conda
๐ To run model:
- run
๐ Information:
- info
Serving model eos4se9...
+ echo Serving model eos4se9...
+ nginx -g daemon off;
However, all I could see was the nginx
start command. I will need to wait a bit longer for my CPUs to return to normal before making any conclusions.
---- A few hours later ----
My CPUs returned to normal usage, but the output file hadn't been updated with the 'iupacs_names.' I decided to investigate further by running the command on a modified set of data, which only contained two rows of the original EML data. I didn't stream the output to a file.
ersilia api run -i eml_canonical_copy.csv
-----
{
"input": {
"key": "MCGSCOLBFJQGHM-SCZZXKLOSA-N",
"input": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1",
"text": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1"
},
"output": {
"outcome": [
null
]
}
}
{
"input": {
"key": "GZOSMCIZMLWJML-VJLLXTKPSA-N",
"input": "C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5",
"text": "C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5"
},
"output": {
"outcome": [
null
]
}
}
I also ran a single input
ersilia api run -i "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1"
-----
{
"input": {
"key": "MCGSCOLBFJQGHM-SCZZXKLOSA-N",
"input": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1",
"text": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1"
},
"output": {
"outcome": [
null
]
}
}
I tried using a butane
SMILES notation CCCC
ersilia api run -i "CCCC"
-----
{
"input": {
"key": "IJDNQMDRQITEOD-UHFFFAOYSA-N",
"input": "CCCC",
"text": "CCCC"
},
"output": {
"outcome": [
"butane"
]
}
}
The model correctly predicted the IUPAC name. At that moment, I wasn't sure if I was making a mistake, and I would appreciate some help ๐๐พ .
Try serving the model again and try with different other inputs
Try serving the model again and try with different other inputs
Hello @leilayesufu,
I appreciate your assistance.
I already recreated my virtual environment, reinstalled ersilia, re-fetched the model, and ran tests with methane
, butan-1-ol
, and butane
, using SMILES C
, OCCCC
, and CCCC
respectively. The model correctly predicted their IUPAC names. However, when I attempted to run any SMILES from the EML dataset, the model returned null.
Hi @PromiseFru, were you able to run eos4se9
with EML dataset?
Hi @PromiseFru, were you able to run
eos4se9
with EML dataset?
Hello @joiboi08
I've successfully made predictions directly from the Docker container. I tested it with a few individual SMILES, and now I'm running the EML dataset directly from the Docker container. I'll keep you updated on whether it makes predictions this time or not.
Hi @PromiseFru I seem to be facing some issues with running ersilia locally and I am unable to reproduce your issue right now but I will look into it for sure. Meanwhile could you tell me what you mean by the following:
I've successfully made predictions directly from the Docker container. I tested it with a few individual SMILES, and now I'm running the EML dataset directly from the Docker container. I'll keep you updated on whether it makes predictions this time or not.
Thank you, @DhanshreeA, for your response.
Over the past few days, I've faced challenges making predictions from my locally installed version of Ersilia for the SMILES in the EML dataset, as mentioned here. Today, I tried a different approach since I noticed the logs showed a Status code 504 error, indicating that the Ersilia API might not be receiving a response from the Docker container in a timely manner. I decided to run Ersilia directly from the Docker container.
docker exec -it eos4se9_c020 bash
I then made a prediction using a single SMILE from the EML dataset:
ersilia -v api run -i "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1"
-----
{
"input": {
"key": "MCGSCOLBFJQGHM-SCZZXKLOSA-N",
"input": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1",
"text": "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1"
},
"output": {
"outcome": [
"[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol"
]
}
}
The prediction was successful, unlike my previous attempts to run predictions using my locally installed version of Ersilia as described here.
As a result, I decided to proceed with the prediction process as outlined here, but this time within the eos4se9_c020
Docker container.
I was able to run predictions directly within the eos4se9_c020
docker container. It turned out successful, but unfortunately, after more than 5 hours of prediction, it ended with a 'Connection error.' I forgot to capture the 'Connection error traceback.' Here are the steps I followed:
docker exec -it eos4se9_c020 bash
wget https://raw.githubusercontent.com/ersilia-os/ersilia/master/notebooks/eml_canonical.csv
ersilia -v api run -i eml_canonical.csv -o ersilia_eml_canonical_IUPAC_predictions.csv
-----
#Output:
10:02:53 | DEBUG | Reading standard file from /tmp/ersilia-h_zmdo4x/standard_input_file.csv
10:02:53 | DEBUG | File has 443 lines
10:02:53 | DEBUG | No file splitting necessary!
10:02:54 | DEBUG | Reading card from eos4se9
10:02:54 | DEBUG | Reading shape from eos4se9
10:02:54 | DEBUG | Input Shape: Single
10:02:54 | DEBUG | Input type is: compound
10:02:54 | DEBUG | Input shape is: Single
10:02:54 | DEBUG | Importing module: .types.compound
10:02:54 | DEBUG | Checking RDKIT and other requirements necessary for compound inputs
10:02:54 | DEBUG | InputShapeSingle shape: Single
10:02:54 | DEBUG | API eos4se9:run initialized at URL http://127.0.0.1:3000
10:02:54 | DEBUG | Schema available in /root/eos/dest/eos4se9/api_schema.json
10:02:54 | DEBUG | Posting to run
10:02:54 | DEBUG | Batch size 100
10:02:54 | DEBUG | Stopping sniffer for finding delimiter
10:02:54 | DEBUG | Expected number: 1
10:02:54 | DEBUG | Entity is list: False
10:02:54 | DEBUG | Stopping sniffer for resolving column types
10:02:54 | DEBUG | Has header True
10:02:54 | DEBUG | Schema {'input': [1], 'key': None}
10:02:54 | DEBUG | Standardizing input single
10:02:54 | DEBUG | Reading standard file from /tmp/ersilia-rmalz1lu/standard_input_file.csv
10:02:54 | DEBUG | Schema available in /root/eos/dest/eos4se9/api_schema.json
11:21:48 | DEBUG | Status code: 200
11:21:48 | DEBUG | Schema available in /root/eos/dest/eos4se9/api_schema.json
12:38:09 | DEBUG | Status code: 200
13:51:55 | DEBUG | Status code: 200
15:04:01 | DEBUG | Status code: 200
15:31:14 | DEBUG | Status code: 200
15:31:14 | DEBUG | Done with unique posting
Hello @PromiseFru, Thank you for your efforts in running the model.
For models fetched from docker(default for ersilia CLI), one thing to note is that some models are computationally intensive(Requiring up to 16GB RAM to work effectively).
Status code: 504
error suggests that the docker container may not be responding to requests in a timely manner.
Since the model is pulled from docker, consider increasing the RAM in your docker desktop to 16GB
. By default, docker desktop is to use up to 2 GB of your host's memory.
To increase the RAM, Go to Settings > Resources > Advanced
Another option is for you to fetch the model from GitHub, by adding the --from_github flag in the command (ersilia -v fetch eos4se9 --from_github > eos4se9_fetch_github.log 2>&1
)
Also to avoid running(making predictions) for long hours, I suggest you consider the first 10(or 50) molecules in the EML dataset and use those for comparison.
Please share the specifications of your Machine for further help.
Hello @PromiseFru, Thank you for your efforts in running the model.
For models fetched from docker(default for ersilia CLI), one thing to note is that some models are computationally intensive(Requiring up to 16GB RAM to work effectively).
Status code: 504
error suggests that the docker container may not be responding to requests in a timely manner.Since the model is pulled from docker, consider increasing the RAM in your docker desktop to
16GB
. By default, docker desktop is to use up to 2 GB of your host's memory. To increase the RAM, Go toSettings > Resources > Advanced
Another option is for you to fetch the model from GitHub, by adding the --from_github flag in the command (
ersilia -v fetch eos4se9 --from_github > eos4se9_fetch_github.log 2>&1
)Also to avoid running(making predictions) for long hours, I suggest you consider the first 10(or 50) molecules in the EML dataset and use those for comparison.
Please share the specifications of your Machine for further help.
Hello @HellenNamulinda,
Thank you for your assistance ๐ค . I'm using Docker CLI and don't have Docker Desktop installed. On Linux, the Docker installation automatically grants Docker containers full access to the host resources. To further clarify this, I checked how many resources were allocated to the eos4se9_c020
Docker container:
docker container stats eos4se9_c020
-----
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
667d7bf3f1db eos4se9_c020 0.01% 144.7MiB / 15.59GiB 0.91% 39kB / 10.3kB 168MB / 201kB 10
The result indicates that the container has a memory limit of 15.59GB
, which matches my system's RAM capacity, suggesting that it has full access to my system's RAM.
Fetching from GitHub is a valid option, as you suggested. I'll definitely try that when I recharge my internet data bundle later in the day.
I wasn't aware that I could reduce the EML dataset to speed up predictions, as I was trying not to modify the file. However, thank you for the information. Reducing the dataset will indeed decrease prediction time, helping me obtain results faster.
Memory: 16.0GB Processor: Intelยฎ Coreโข i5-2500S CPU @ 2.70GHz ร 4 Graphics: Mesa Intelยฎ HD Graphics 2000 (SNB GT1) / AMDยฎ Turks
After several days of attempting to run predictions with the Ersilia model locally, I finally succeeded in making the predictions using the method I described here. Following a suggestion from @HellenNamulinda, I reduced the EML dataset to 50 entries to decrease prediction time.
I created a Python script to compare the outputs from the two models: the Ersilia STOUT Model (eos4se9) and the Original STOUT model.
import csv
def read_csv_to_dict(file_path, compare_column):
data = {}
with open(file_path, newline="", encoding="utf-8") as csvfile:
reader = csv.DictReader(csvfile)
for index, row in enumerate(reader, 1):
data[index] = row[compare_column]
return data
def compare_csv_files(file1, file2, compare_column1, compare_column2):
data1 = read_csv_to_dict(file1, compare_column1)
data2 = read_csv_to_dict(file2, compare_column2)
drug_data2 = read_csv_to_dict(file2, "drugs")
col_differences = {}
for index, value1 in data1.items():
value2 = data2.get(index)
drug_name = drug_data2.get(index)
if value1 != value2:
col_differences[index] = (value1, value2, drug_name)
return col_differences
def write_md_diff_file(output_file, differences, compare_column1, compare_column2):
with open(output_file, "w", encoding="utf-8") as md_file:
md_file.write(
f"| Index | Ersilia_STOUT_Prediction | STOUT Prediction | Drug Name |\n"
)
md_file.write("|-------|------------|------------|-----------|\n")
for index, (value1, value2, drug_name) in differences.items():
md_file.write(f"| {index} | {value1} | {value2} | {drug_name} |\n")
if __name__ == "__main__":
file1 = "ersilia_eml_canonical_IUPAC_predictions.csv"
file2 = "eml_canonical_IUPAC_predictions.csv"
output_file = "differences.md"
compare_column1 = "iupacs_names"
compare_column2 = "iupac"
differences = compare_csv_files(file1, file2, compare_column1, compare_column2)
write_md_diff_file(output_file, differences, compare_column1, compare_column2)
print("Differences have been written to 'differences.md'")
Here are the results
Index | Ersilia_STOUT_Prediction | STOUT Prediction | Drug Name |
---|---|---|---|
1 | [(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol | [(1S,4R)-4-[2-amino-6-(cyclopropylamino)purin-9-yl]cyclopent-2-en-1-yl]methanol | abacavir |
2 | (1S,2S,5S,10R,11R,14S)-5,11-dimethyl-5-pyridin-3-yltetracyclo[9.4.0.02,6.010,14]pentadeca-7,16-dien-14-ol | (3S,8R,9S,10R,13S,14S)-10,13-dimethyl-17-pyridin-3-yl-2,3,4,7,8,9,11,12,14,15-decahydro-1H-cyclopenta[a]phenanthren-3-ol | abiraterone |
3 | N-[5-[amino(dioxo)-ฮป6-thia-3,4-diazacyclopent-2-en-2-yl]acetamide | N-(5-sulfamoyl-1,3,4-thiadiazol-2-yl)acetamide | acetazolamide |
8 | 2-[(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]oxy-1,1-dithiophen-2-ylethanol | [(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]2-hydroxy-2,2-dithiophen-2-ylacetate | aclidinium |
9 | (E)-N-[6-[[(3-chloro-4-fluorocyclohexa-1,4-dien-1-yl)amino]methylidene]-3-[(3S)-oxolan-3-yl]oxycyclopenta[d]pyrimidin-2-yl]-4-(dimethylamino)but-2-enamide | (E)-N-[4-(3-chloro-4-fluoroanilino)-7-[(3S)-oxolan-3-yl]oxyquinazolin-6-yl]-4-(dimethylamino)but-2-enamide | afatinib |
12 | 5-acetamido-2,4,6-triiodo-3-(1-oxoethylamino)cyclohexa-4,6-diene-1-carboxylicacid | 3,5-diacetamido-2,4,6-triiodobenzoicacid | amidotrizoate |
13 | (2S)-N-[(1R,2R,3R,5S,6R)-5-amino-2-[(2R,3R,4R,5R,6R)-3-amino-4,5,6-trihydroxyoxan-2-yl]oxy-3-[(2R,3S,4R,5R)-5-amino-1,3,4,6-tetrahydroxyhexan-2-yl]oxy-1-hydroxyoxetan-6-yl]-2-hydroxy-4-(methylamino)butanamide | (2S)-4-amino-N-[(1R,2S,3S,4R,5S)-5-amino-2-[(2S,3R,4S,5S,6R)-4-amino-3,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-4-[(2R,3R,4S,5S,6R)-6-(aminomethyl)-3,4,5-trihydroxyoxan-2-yl]oxy-3-hydroxycyclohexyl]-2-hydroxybutanamide | amikacin |
14 | 3,5-diamino-2-chloro-N-(diaminomethylidene)-2H-pyrazine-6-carboxamide | 3,5-diamino-6-chloro-N-(diaminomethylidene)pyrazine-2-carboxamide | amiloride |
15 | 2-butyl-3-[4-[2-(diethylamino)ethoxy]-3,5-diiodocyclohexa-1,4-dien-1-yl]chromen-4-one | (2-butyl-1-benzofuran-3-yl)-[4-[2-(diethylamino)ethoxy]-3,5-diiodophenyl]methanone | amiodarone |
17 | ethyl2-(2-aminoethoxymethyl)-4-[[3-(2-chlorophenyl)-4-methoxy-4-oxobut-2-en-2-yl]amino]cyclopenta-1,3-diene-1-carboxylate | 3-O-ethyl5-O-methyl2-(2-aminoethoxymethyl)-4-(2-chlorophenyl)-6-methyl-1,4-dihydropyridine-3,5-dicarboxylate | amlodipine |
18 | 12-chloro-7-(diethylaminomethyl)-2,9-diazatricyclo[8.4.0.03,8]tetradeca-1(14),4,6,9,10,13-hexaen-6-ol | 4-[(7-chloroquinolin-4-yl)amino]-2-(diethylaminomethyl)phenol | amodiaquine |
19 | (2S,5R,6R)-5-[[(2R)-2-amino-2-(4-hydroxycyclohexa-1,3,5-trien-1-yl)acetyl]amino]-3,3-dimethyl-8-oxo-4-thia-1,7-diazabicyclo[4.3.0]nonane-2-carboxylicacid;tetrahydrate | (2S,5R,6R)-6-[[(2R)-2-amino-2-(4-hydroxyphenyl)acetyl]amino]-3,3-dimethyl-7-oxo-4-thia-1-azabicyclo[3.2.0]heptane-2-carboxylicacid;trihydrate | amoxicillin |
20 | (1S,3S,5S,7S,9R,10R,13R,18S,19R,20R,21S,22Z,24Z,26Z,28Z,30Z,32Z,34Z,36Z,38Z,40S,41R)-1-[(2S,3S,4R,5S,6R)-4-amino-3,5-dihydroxy-6-[(2R,3S,4R,5S,6R)-5-amino-3,4-dihydroxyoxan-2-yl]oxan-2-yl]oxy-3,5,7,9,10,13,18,41-octahydroxy-19,20,21-trimethyl-15-oxo-4,16,42-trioxatricyclo[37.2.1.03,5]dotetraconta-22,24,26,28,30,32,34,36,38-nonaene-40-carboxylicacid | (1R,3S,5R,6R,9R,11R,15S,16R,17R,18S,19Z,21Z,23Z,25Z,27Z,29Z,31Z,33R,35S,36R,37S)-33-[(2R,3S,4S,5S,6R)-4-amino-3,5-dihydroxy-6-methyloxan-2-yl]oxy-1,3,5,6,9,11,17,37-octahydroxy-15,16,18-trimethyl-13-oxo-14,39-dioxabicyclo[33.3.1]nonatriaconta-19,21,23,25,27,29,31-heptaene-36-carboxylicacid | amphotericin B |
21 | (2S,5R,6R)-7-[[(2R)-2-amino-2-phenylacetyl]amino]-3,3-dimethyl-8-oxo-4-thia-1,7-diazabicyclo[3.3.0]octane-2-carboxylicacid | (2S,5R,6R)-6-[[(2R)-2-amino-2-phenylacetyl]amino]-3,3-dimethyl-7-oxo-4-thia-1-azabicyclo[3.2.0]heptane-2-carboxylicacid | ampicillin |
22 | 5-[3-(2-cyanopropan-2-yl)-6-(1,2,4-triazol-1-ylmethyl)cyclohexa-2,4-dien-1-yl]-2,2-dimethylbutanenitrile | 2-[3-(2-cyanopropan-2-yl)-5-(1,2,4-triazol-1-ylmethyl)phenyl]-2-methylpropanenitrile | anastrozole |
23 | (4S,6R,7S,10S,11S,14S,15S,16S,20S,23R,26S)-16,17,23,26-tetrahydroxy-7-(4-hydroxycyclohexa-1,3,5-trien-1-yl)-11-(4-pentoxycyclohexa-2,4,6-trien-1-ylidene)-2-[[(2S,3S,4S)-3,4-dihydroxy-4-(4-hydroxycyclohexa-1,3,5-trien-1-yl)-2-[[(3S,4S,6R)-4-hydroxy-1-[(2S,3S)-3-hydroxybutan-2-yl]-2,6-dioxopiperazine-3-carbonyl]amino]butanoyl]amino]-14-methyl-2,5,12,17,24-hexazapentacyclo[24.2.2.218,21.04,10.06,14]dotriaconta-1(29),18(30),19,21(31),27,32-hexaene-3,11,13-trione | N-[(3S,6S,9S,11R,15S,18S,20R,21R,24S,25S,26S)-6-[(1S,2S)-1,2-dihydroxy-2-(4-hydroxyphenyl)ethyl]-11,20,21,25-tetrahydroxy-3,15-bis[(1S)-1-hydroxyethyl]-26-methyl-2,5,8,14,17,23-hexaoxo-1,4,7,13,16,22-hexazatricyclo[22.3.0.09,13]heptacosan-18-yl]-4-[4-(4-pentoxyphenyl)phenyl]benzamide | anidulafungin |
24 | 14-[amino(oxo)methyl]-12-(4-methoxycyclohepta-2,4,6-trien-1-ylidene)-5-(2-oxopiperidin-1-yl)-4,11,12-triazatricyclo[7.3.2.14,8]pentadeca-1(13),6,8(15),10-tetraen-15-one | 1-(4-methoxyphenyl)-7-oxo-6-[4-(2-oxopiperidin-1-yl)phenyl]-4,5-dihydropyrazolo[3,4-c]pyridine-3-carboxamide | apixaban |
25 | (5R,6S)-5-(4-fluorocyclohepta-1,3,6-trien-1-yl)-6-[(1R)-1-[5,5,5-trifluoro-4-(trifluoromethyl)penta-1,3-dienyl]ethoxy]-1,2,5,6-tetrahydro-1,4,7-oxadiazocin-3-one | 5-[[(2S,3R)-2-[(1R)-1-[3,5-bis(trifluoromethyl)phenyl]ethoxy]-3-(4-fluorophenyl)morpholin-4-yl]methyl]-1,2-dihydro-1,2,4-triazol-3-one | aprepitant |
26 | arsorosooxy(oxo)arsane | oxoarsanyloxyarsenic | arsenic trioxide |
27 | (1R,4S,5R,8S,9R,10S,12S,13S)-10-methoxy-5,9-dimethyl-11,14,15,16-tetraoxatetracyclo[10.3.1.04,13.08,13]hexadecane | (1R,4S,5R,8S,9R,10S,12R,13R)-10-methoxy-1,5,9-trimethyl-11,14,15,16-tetraoxatetracyclo[10.3.1.04,13.08,13]hexadecane | artemether |
28 | 4-oxo-4-[(1S,4R,5S,8S,9R,10S,15S)-4,9,12-trimethyl-11,16,17,18-tetraoxatetracyclo[10.3.2.05,15.08,15]heptadecan-10-yl]butanoicacid | 4-oxo-4-[[(4S,5R,8S,9R,10R,12R,13R)-1,5,9-trimethyl-11,14,15,16-tetraoxatetracyclo[10.3.1.04,13.08,13]hexadecan-10-yl]oxy]butanoicacid | artesunate |
29 | 5-(1,2-dihydroxyethyl)-4-methylidenefuran-2,3-diol | 2-(1,2-dihydroxyethyl)-4,5-dihydroxyfuran-3-one | ascorbic acid |
30 | methylN-[(2S)-1-[2-[(2S,3S)-2-hydroxy-3-[[(2S)-2-(methoxycarbonylamino)-3,3-dimethylbutanoyl]amino]-4-phenylbutyl]-2-[(4-pyridin-2-ylcyclohexa-2,5-dien-1-yl)methyl]hydrazinyl]-3,3-dimethyl-1-oxobutan-2-yl]carbamate | methylN-[(2S)-1-[2-[(2S,3S)-2-hydroxy-3-[[(2S)-2-(methoxycarbonylamino)-3,3-dimethylbutanoyl]amino]-4-phenylbutyl]-2-[(4-pyridin-2-ylphenyl)methyl]hydrazinyl]-3,3-dimethyl-1-oxobutan-2-yl]carbamate | atazanavir |
31 | (3R,5R)-7-[2-(4-fluorocyclohepta-2,4,6-trien-1-ylidene)-3-phenyl-4-(phenylcarbamoyl)-5-propan-2-yl-3H-pyrrol-1-yl]-3,5-dihydroxyheptanoicacid | (3R,5R)-7-[2-(4-fluorophenyl)-3-phenyl-4-(phenylcarbamoyl)-5-propan-2-ylpyrrol-1-yl]-3,5-dihydroxyheptanoicacid | atorvastatin |
32 | 5-[3-[1-[(4,5-dimethoxycyclohexa-1,3,5-trien-1-yl)methyl]-7,8-dimethoxy-2-methyl-1,3,4,6-tetrahydroisoquinolin-2-ium-2-yl]propanoyloxy]pentyl13-[4-[2-[4,5,6-trimethoxy-10-(4,5-dimethoxycyclohexa-2,4-dien-1-ylidene)cyclobut-2-en-1-yl]ethyl]-4-methyl-7-oxo-1-oxa-4-azoniacyclononan-1-yl]propanoate | 5-[3-[1-[(3,4-dimethoxyphenyl)methyl]-6,7-dimethoxy-2-methyl-3,4-dihydro-1H-isoquinolin-2-ium-2-yl]propanoyloxy]pentyl3-[1-[(3,4-dimethoxyphenyl)methyl]-6,7-dimethoxy-2-methyl-3,4-dihydro-1H-isoquinolin-2-ium-2-yl]propanoate | atracurium |
33 | (9-methyl-4-oxa-9-azabicyclo[4.2.1]nonan-5-yl)3-hydroxy-2-phenylpropanoate | (8-methyl-8-azabicyclo[3.2.1]octan-3-yl)3-hydroxy-2-phenylpropanoate | atropine |
34 | [(2S,5R)-2-(carbamoyl)-7-oxo-1,6-diazabicyclo[3.2.1]octan-6-yl]hydrogensulfate | [(2S,5R)-2-carbamoyl-7-oxo-1,6-diazabicyclo[3.2.1]octan-6-yl]hydrogensulfate | avibactam |
35 | 1-methyl-4-nitro-5-(7H-purin-6-ylsulfanyl)-4H-pyrimidine | 6-(3-methyl-5-nitroimidazol-4-yl)sulfanyl-7H-purine | azathioprine |
36 | (2R,3S,5R,6S,7R,9S)-7-[(2R,4R)-5-[[(2R,3R,4R,5R)-4,5-dihydroxy-3-methoxy-5-methyloxan-2-yl]-methylamino]-2-hydroxy-4-methylpentan-2-yl]-9-[(2R,4S,5S,6S)-4-(dimethylamino)-5-hydroxypentan-2-yl]oxy-3-ethyl-6-hydroxy-2,6-dimethyl-4-[(2R,4R,5S,6S)-5-hydroxy-4-methoxy-4-methyloxan-2-yl]oxyoxonan-1-one | (2R,3S,4R,5R,8R,10R,11R,13S,14R)-11-[(2S,3R,4S,6R)-4-(dimethylamino)-3-hydroxy-6-methyloxan-2-yl]oxy-2-ethyl-3,4,10-trihydroxy-13-[(2R,4R,5S,6S)-5-hydroxy-4-methoxy-4,6-dimethyloxan-2-yl]oxy-3,5,6,8,10,12,14-heptamethyl-1-oxa-6-azacyclopentadecan-15-one | azithromycin |
38 | (1S,10S,11S,13S,14S,15S,17S)-18-chloro-14,17-dihydroxy-14-(2-hydroxyacetyl)-13,15,18-trimethyltetracyclo[8.7.1.01,6.011,15]octadeca-2,5-dien-4-one | (8S,9R,10S,11S,13S,14S,16S,17R)-9-chloro-11,17-dihydroxy-17-(2-hydroxyacetyl)-10,13,16-trimethyl-6,7,8,11,12,14,15,16-octahydrocyclopenta[a]phenanthren-3-one | beclometasone |
40 | 11-[bis(2-chloroethyl)amino]-4-methyl-2,4-diazabicyclo[7.3.1]trideca-1(12),2,9-triene-3-carboxylicacid | 4-[5-[bis(2-chloroethyl)amino]-1-methylbenzimidazol-2-yl]butanoicacid | bendamustine |
41 | 2-amino-N'-[(4,6-dihydroxycyclohexa-1,3,5-trien-1-yl)methyl]-3-hydroxypropanehydrazide | 2-amino-3-hydroxy-N'-[(2,3,4-trihydroxyphenyl)methyl]propanehydrazide | benserazide |
42 | (3R,6R,8R)-2,2-dimethyl-5-oxo-6-(2-phenylacetyl)-4-thia-1,7-diazabicyclo[4.3.0]nonane-3-carboxylicacid;N-benzyl-N'-(cyclohexa-2,4,6-trien-1-ylmethyl)ethane-1,2-diamine;(2S,5R,6R)-3,3-dimethyl-7-oxo-2-(2-phenylacetyl)-4-thia-1,8-diazabicyclo[4.3.0]nonane-5-carboxylicacid | N,N'-dibenzylethane-1,2-diamine;(2S,5R,6R)-3,3-dimethyl-7-oxo-6-[(2-phenylacetyl)amino]-4-thia-1-azabicyclo[3.2.0]heptane-2-carboxylicacid | benzathine benzylpenicillin |
43 | N-benzyl-2-(2-nitroimidazol-1-yl)ethanamine | N-benzyl-2-(2-nitroimidazol-1-yl)acetamide | benznidazole |
44 | phenylmethylbenzenecarboperoxoate | benzoylbenzenecarboperoxoate | benzoyl peroxide |
45 | benzylbenzenecarboxylate | benzylbenzoate | benzyl benzoate |
47 | (1S,2S,4S,5S,6S,8S,9S,17S)-17-fluoro-5,6-dihydroxy-5-(2-hydroxyacetyl)-4,6,17-trimethyltetracyclo[7.7.1.01,12.02,8]heptadeca-11,14-dien-15-one | (8S,9R,10S,11S,13S,14S,16S,17R)-9-fluoro-11,17-dihydroxy-17-(2-hydroxyacetyl)-10,13,16-trimethyl-6,7,8,11,12,14,15,16-octahydrocyclopenta[a]phenanthren-3-one | betamethasone |
48 | N-(4-cyano-5,5,5-trifluoropenta-2,4-dien-1-yl)-3-(4-fluorocyclohepta-1,3,6-trien-1-yl)sulfonyl-2-hydroxy-2-methylpropanamide | N-[4-cyano-3-(trifluoromethyl)phenyl]-3-(4-fluorophenyl)sulfonyl-2-hydroxy-2-methylpropanamide | bicalutamide |
49 | 1-[3-(2-bicyclo[2.2.1]hept-5-enyl)-3-methoxy-3-phenylpropyl]piperidine | 1-(2-bicyclo[2.2.1]hept-5-enyl)-1-phenyl-3-piperidin-1-ylpropan-1-ol | biperiden |
50 | [4-[[5-(1-oxoethoxy)cyclohepta-1,3,6-trien-1-yl]-pyridin-2-ylmethyl]cyclohexa-1,5-dien-1-yl]acetate | [4-[(4-acetyloxyphenyl)-pyridin-2-ylmethyl]phenyl]acetate | bisacodyl |
differences.md eml_canonical_IUPAC_predictions.csv ersilia_eml_canonical_IUPAC_predictions.csv
Publication: ACM Digital Library
Published: 2018
Authors: Shahar Harel, Kira Radinsky
Source Code: GitHub - PyTorch, GitHub - TensorFlow
Dataset: Zinc Dataset
This machine learning model helps scientists find new drugs quickly and at a lower cost. It does this by suggesting different molecules based on existing drugs and has already found some approved drugs. One of these drugs is Isoniazid
, which treats tuberculosis
. Here's how the model works:
It starts by turning the molecule (written in SMILES notation) into numbers using the encoder function. This helps the model understand the molecule mathematically.
The model then uses math to pick out important parts from these numbers, helping it understand the molecule's structure.
The model also adds extra compounds to the molecule while making sure it follows some rules. This is called a "diversity layer."
Finally, it uses a Recurrent Neural Network (RNN) to put the molecule together step by step, making sure it follows the rules of chemistry.
This model aligns with Ersilia's mission by making drug discovery faster and cheaper, which is especially useful for developing countries. It has also helped discover drugs for infectious diseases like tuberculosis, showing its potential to find drugs for other infectious and neglected diseases.
The TensorFlow implementation needs updates and isn't ready to use. It lacks documentation, and the model checkpoints mentioned in the evaluate.py script as model-45000
are missing. It also lacks a license, and there are syntax errors in the scripts.
The PyTorch version seems more ready for use, but it lacks methods for running the model. It mainly offers ways to download SMILES data and train, as shown in the examples/example_zinc.py script. The documentation is incomplete, missing usage instructions, and its requirements might be outdated. It's licensed under the MIT license.
Publication: National Library of Medicine
Published: 2022
Authors: Mauro Cรฉsar Cafundรณ Morais, Diogo Silva, Matheus Marques Milagre, Maykon Tavares de Oliveira, Thaรญs Pereira, Joรฃo Santana Silva, Luciano da F. Costa, Paola Minoprio, Roberto Marcondes Cesar Junior, Ricardo Gazzinelli, Marta de Lana, Helder I. Nakaya
Source Code: GitHub - Python
Dataset: Image Data Files
This model helps find and treat Chagas disease faster. It also lowers the cost of the Chagas disease detection, usually done with a high resolution camera on a microscope. This model detects the disease using images from a mobile phone. Here's how it works:
Prepare blood smear samples.
Take pictures of blood smear with a mobile phone camera attached to a microscope eyepiece.
The images are analyzed using a graph-based method, which helps separate the relevant parts of the image from the background. This step helps isolates the area that might contain the T. cruzi
(Trypanosoma cruzi) parasite. This process is called Image Segmentation
.
The model extracts various characteristics or features from the segmented image. These features could include things like the shape of the parasite, its color, texture, and more. This is called Feature Extraction
.
The model then selects the most relevant features from those extracted. This step helps improve the model's efficiency to detect the T. cruzi
parasite. This process is called Feature Selection
.
The model then uses these features to make a decision about whether the T. cruzi
parasite is present in the image or not. It does so by comparing the features to patterns it has learned during training.
Chagas disease is a dangerous, infectious and neglected disease. This model makes it faster and cheaper to detect, which aligns with Ersilia's mission.
The code is well documented and uses the GNU General Public License v3.0
license. I was able to set it up and test as follows:
Clone the repository
git clone https://github.com/csbl-br/chagas_detection.git
Create and activate a conda environment
conda create -n ChD python==3.9
conda activate ChD
Install the required packages
conda install --file requirements.txt
Extract features from all images
cd main
python process_all_images.py
This created an
output
directory and added CSV files containing the extracted features for the images in the./images
directory
Train a model
python feature_classification.py
This produced the following output and generated a graph showing the model's performance after training:
Model: SVC
Sensitivity: 0.6745
Specificity: 0.8024
Precision: 0.7774
Accuracy: 0.7377
F1-score: 0.7223
AUC: 0.7967
[[865 213]
[359 744]]
Publication: Papers with Code
Published: 2020
Authors: Masaki Asada, Makoto Miwa, Yutaka Sasaki
Source Code: GitHub - Python
Dataset: Semantic Scholar, DDI
This model helps scientists find better and cheaper drug combinations faster, making treatments more effective, reducing the risk of drug resistance and side effects, and speeding up patients' recovery. Here's how it works:
The model takes biomedical text data as its input. This text contains information about drugs, their interactions, and related details. It also uses external drug database information from DrugBank, which contains structured information about various drugs, including their descriptions and molecular structures.
The information from the drugs mentioned in the input text (target drugs) and the data from the drug database are used to create an improved drug text input (enriched input). The model then obtains their descriptions using SciBERT, a BERT model trained on large-scale biomedical and computer science text. These descriptions contain useful information about how the drugs are described in other biomedical literature.
The model then obtains the molecular structure of the target drugs by using a molecular graph neural network (GNN) model. This representation captures the structural characteristics of the drugs.
The model combines the enriched input, drug descriptions, and molecular structures. This combination helps the model understand how the drugs interact with each other.
The model then determines whether the drug interactions create weaker or stronger treatments for the patients.
This model's fast drug combination discovery and cost reduction can make affordable and effective treatments available in underdeveloped countries, in line with Ersilia's mission. It can also help reduce drug resistance and minimize side effects, leading to faster patient recovery.
The code uses the MIT
license. The requirements were not completely listed, and the source code may need to be updated. I was able to set it up and test as follows:
Clone the repository
git clone https://github.com/tticoin/DESC_MOL-DDIE.git
Create and activate a conda environment
conda create -n DESC_MOL-DDIE python=3.7
conda activate DESC_MOL-DDIE
Install the required packages
pip install rdkit-pypi
pip install torch
pip install tensorboard
pip install six
pip install tqdm
pip install transformers
Preprocess the sample dataset
python fingerprint/preprocessor.py sample/tsv none 1 sample/radius1
This will create a radius1
directory in the sample
directory and add three files: config.json
, corpus_dev.npy
, and corpus_train.npy
.
Perform DDI Extraction
cd main
python run_ddie.py \
--task_name MRPC \
--model_type bert \
--data_dir ../sample/tsv \
--model_name_or_path SCIBERT_MODEL \
--per_gpu_train_batch_size 32 \
--num_train_epochs 3. \
--dropout_prob .1 \
--weight_decay .01 \
--fp16 \
--do_train \
--do_eval \
--do_lower_case \
--max_seq_length 128 \
--use_cnn \
--conv_window_size 5 \
--pos_emb_dim 10 \
--activation gelu \
--desc_conv_window_size 3 \
--desc_conv_output_size 20 \
--molecular_vector_size 50 \
--gnn_layer_hidden 5 \
--gnn_layer_output 1 \
--gnn_mode sum \
--gnn_activation gelu \
--fingerprint_dir ../sample/radius1 \
--output_dir output
However, there was an error:
Traceback (most recent call last):
File "run_ddie.py", line 55, in <module>
from transformers import AdamW, WarmupLinearSchedule
ImportError: cannot import name 'WarmupLinearSchedule' from 'transformers'
After researching the error, a GitHub thread indicates that WarmupLinearSchedule
should be changed to get_linear_schedule_with_warmup
.
Hello,
Thanks for your work during the Outreachy contribution period, we hope you enjoyed it! We will now close this issue while we work on the selection of interns. Thanks again!
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application