Closed kellenkinya closed 8 months ago
Task 1 : Joined the community and introduced myself to my peers and the mentors. Task 2 : Opened a github issue Task 3 : My operating system is windows 10 so I had to first install WSL in PowerShell running as an administrator , then downloaded and installed ubuntu on windows, the next step was to install all the third part prerequisites needed for the ersilia model to run found in this link .
Task 3: Able to download and install the models locally through the fetch command. Once this step is done the model is ready for use. then I served the model and made both individual and multiple predictions.
What I learnt on the predictions: The higher the score, the more synthetically accessible the molecule is predicted to be. This journal by PubMed central gave me some more information on the SA scores.
Unable to make predictions today, I was running into object of NoneType has no len error as discussed in the slack channel
Steps I took to solve
Uninstalled conda.
switched from python 3.7 to python 3.10 to see if it will fix the issue
Removed the initial ersilia directory I had cloned since I had modified the code in the repo
rm -rf ersilia
git clone https://github.com/ersilia-os/ersilia.git
Created a new conda environment and activated it.
conda create -n ersilia python=3.10
conda activate ersilia
The adjustments worked and I was able to
ersilia catalog
One of the predictions :
Motivation to work in ersilia should have:
Why I want to work at ersilia After getting an approval than my initial outreachy application has been approved and the next step was to find a project or project that I am interested in. I had to take sometime to go through the list, read what all the organizations stand for and what they do. In the end ersilia was it for me. A non-profit organization with a mission to equip laboratories in Low and Middle Income Countries with state of the art AI/ML tools for infectious and neglected disease research. Their goals, mission and vision resonates with me fully. Getting a chance to intern at ersilia gives me an opportunity to do good for the humanity, that what I do or assist in as an impact in the world, it helps people, especially from lower income countries, I come from a lower middle income income country, Kenya and I hope in the long run ersilia grows and spreads to Kenya.
I graduated last year with a degree in Actuarial Science , then ventured into machine learning and data science using python, most of the projects I have worked on in ML are about analyzing sales, or revenues or prices of items, but to see real application of ML in biomedical and experimental research transcends it all and all I can think of is finally I found my purpose and my path. I want to apply my data science skills in medical research, intern, volunteer or work on medical research institutes, disease and discovery centers, do open source contributions in disease and drugs research and one day when my time comes to leave this world let them that remain and know me say She gave her all in disease and drugs research to help the people and that's how we make living more beautiful and help reduce peoples pain.
That's why I would like to work with ersilia, A opportunity for the internship will:
File "/home/kellen/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 430, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.
failed
The error was caused by slow internet connection
Closed the terminal, gave it some time and retried the steps again
Hi @kellenkinya Welcome to Ersilia!. Thank you very much for all the effort. Regarding the error when you try to get the predictions, we are working on solving the problem, we have found that the isaura package is causing conflict with ersilia, so please do not install it.
The adjustments worked and I was able to
1. Check all the available models in the website `ersilia catalog` 2. Fetch different models and installed them locally using the **fetch command** using the **slug** or **ersilia identifier** 3. Serve the model 4. Make several predictions by either running single models or multiple models in batch mode.
One of the predictions :
Please, here you must be within the created environment of 'ersilia'. In the shared image you are working on the base conda environment, not on the ersilia environment that you had created in the previous steps. Follow the steps to run a basic ersilia model, do not attach images, please attach only the output logs when you run a model like this, all within the created ersilia environment: ersilia -v fetch eos3b5e > my.log 2>&1 ersilia -v serve eos3b5e ersilia -v run -i "CCC" > my.log 2>&1
Please make sure you execute the steps correctly before continuing with the following tasks. Thank you
Thank you for the corrections.
I unistalled isaura, then conda activated ersilia and ran the basic ersilia model, attached is my log
What I learnt on the predictions: The higher the score, the more synthetically accessible the molecule is predicted to be. This journal by PubMed central gave me some more information on the SA scores.
Hi @kellenkinya normally it is a good practice to mention the model you've used. I'm assuming this is the mode eos2r5a
(Retrosynthetic accessibility score)
Was running into a timeout error when trying to run the application on ADMW@NCATS model.
File "/home/kellen/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 430, in _error_catcher raise ReadTimeoutError(self._pool, None, "Read timed out.") pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. failed
@kellenkinya Did your restarting of the terminal work, or is this still and issue? Quick piece of advice, it's often helpful to share the entire stack trace for better insights into the error.
What I learnt on the predictions: The higher the score, the more synthetically accessible the molecule is predicted to be. This journal by PubMed central gave me some more information on the SA scores.
Hi @kellenkinya normally it is a good practice to mention the model you've used. I'm assuming this is the mode
eos2r5a
(Retrosynthetic accessibility score
Yes it was
Was running into a timeout error when trying to run the application on ADMW@NCATS model.
File "/home/kellen/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 430, in _error_catcher raise ReadTimeoutError(self._pool, None, "Read timed out.") pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. failed
@kellenkinya Did your restarting of the terminal work, or is this still and issue? Quick piece of advice, it's often helpful to share the entire stack trace for better insights into the error.
Hi @kellenkinya just following up, any updates here?
Sorry for a late update.
git clone --recursive https://github.com/ncats/ncats-adme.git
conda env create --prefix ./env -f environment.yml
and then typed
`pip install typed-argument-parser'python app.py
to install the model into the server.The 3rd step is taking sometime on my end , It ran for many hours yesterday and I had some power issues so it stopped and I have to run it again today, I am hoping to have managed this step before the day end and made predictions
`(C:\Users\Admin\ncats-adme\server\env) C:\Users\Admin\ncats-adme\server>python app.py
Loading RLM graph convolutional neural network model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading RLM model files
Loading PAMPA graph convolutional neural network model
Model File Exists Locally
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading PAMPA 7.4 models
Loading PAMPA graph convolutional neural network model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading PAMPA 5.0 models
Loading PAMPA BBB graph convolutional neural network model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading PAMPA BBB models
Loading Solubility graph convolutional neural network model
Model File Exists Locally
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading Solubility models
Loading human liver cytosol stability random forest models
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 75.50it/s]
Finished loading human liver cytosol stability models
Loading CYP450 random forest models
100%|██████████████████████████████████████████████████████████████████████████████████| 64/64 [00:02<00:00, 28.23it/s]
cyp2c9_subs-model_39: 100%|███████████████████████████████████████████████████████| 18.6M/18.6M [00:00<00:00, 2.16GB/s]
cyp2c9_subs-model_40: 100%|████████████████████████████████████████████████████████| 18.3M/18.3M [00:00<00:00, 641MB/s]
cyp2c9_subs-model_41: 100%|███████████████████████████████████████████████████████| 18.6M/18.6M [00:00<00:00, 2.44GB/s]
cyp2c9_subs-model_42: 100%|████████████████████████████████████████████████████████| 18.3M/18.3M [00:00<00:00, 675MB/s]
cyp2c9_subs-model_43: 100%|███████████████████████████████████████████████████████| 18.4M/18.4M [00:00<00:00, 2.41GB/s]
69%|████████████████████████████████████████████████████████▍ | 44/64 [08:42<03:57, 11.88s/it]
17%|█████████████▊ | 1/6 [08:44<43:44, 524.98s/it]
Traceback (most recent call last):
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 710, in _error_catcher
yield
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 835, in _raw_read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(9322106 bytes read, 10090047 more expected)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\models.py", line 816, in generate
yield from self.raw.stream(chunk_size, decode_content=True)
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 936, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 907, in read
data = self._raw_read(amt)
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 835, in _raw_read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "C:\Users\Admin\ncats-adme\server\env\lib\contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 727, in _error_catcher
raise ProtocolError(f"Connection broken: {e!r}", e) from e
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(9322106 bytes read, 10090047 more expected)', IncompleteRead(9322106 bytes read, 10090047 more expected))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "app.py", line 27, in <module>
from predictors.cyp450.cyp450_predictor import CYP450Predictor
File "C:\Users\Admin\ncats-adme\server\predictors\cyp450\__init__.py", line 87, in <module>
cyp450_models_dict = load_models()
File "C:\Users\Admin\ncats-adme\server\predictors\cyp450\__init__.py", line 81, in load_models
cyp450_models_dict[model_name][f'model_{model_number}'] = download_file(base_url, model_name, model_number, cyp450_models_dict)
File "C:\Users\Admin\ncats-adme\server\predictors\cyp450\__init__.py", line 24, in download_file
cyp450_rf_pkl_file_request = requests.get(cyp450_rf_pkl_url)
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\api.py", line 73, in get
return request("get", url, params=params, **kwargs)
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\sessions.py", line 747, in send
r.content
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\models.py", line 899, in content
self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\models.py", line 818, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(9322106 bytes read, 10090047 more expected)', IncompleteRead(9322106 bytes read, 10090047 more `expected))`
I Encountered the above error, After some research I found at the error might be due to
python app.py
and will document how it goes.
Any other help on how to solve is highly appreciated.After running python app.py
overnight and still not able to install all the models(was running for more than 30 hours) into the server I decided to comment on CYP450 isozymes - CYP2C9, CYP2D6, CYP3A4
models which were the ones taking long.
Steps I took
notepad app.py
and commented on the importation #from predictors.cyp450.cyp450_predictor import CYP450Predictor
and also the def predict_df
for the model.I Encountered the above error, After some research I found at the error might be due to
- Network issue: the script is unable to complete the download. I have closed the terminal and checked my connections, let me now retry
python app.py
and will document how it goes. Any other help on how to solve is highly appreciated.
It worked
git clone --recursive https://github.com/ncats/ncats-adme.git
cd ncats_adme
cd server
conda env create --prefix ./env -f environment.yml
pip install typed-argument-parser
python app.py
After all the models have been loaded into NCATS server to assess them I opened my chrome browser and typed http://127.0.0.1:5000/
I then navigated to predict and chose PAMPA pH 7.4 and PAMPA pH 5 models. Then uploaded the Essential Medicine List csv file which I had previously downloaded from ersilia repo and the processed the file.
PAMPA pH 7.4 model is a classification model with predictions of 1 having Low Permeability and prediction of 0 having high or moderate permeability
PAMPA pH 5.0 model is a classification model with predictions of 1 having Low Permeability and prediction of 0 having high or moderate permeability
Both models are classification models with prediction of 0 meaning the molecules are stable and prediction of 1 meaning molecules are unstable
Ersilia runs by
ON https://ersilia.io/model-hub, I filtered using Microsomal stability and found Human Liver Microsomal Stability. Then from github I found its EOS model ID.
ersilia -v fetch eos31ve
Serve the model
ersilia -v serve eos31ve
And the output was
🚀 Serving model eos31ve: ncats-hlm
URL: http://0.0.0.0:56169
PID: -1
SRV: pulled_docker
👉 To run model:
💁 Information:
4. Run predictions and store the output in a file
ersilia -v api predict -i /mnt/c/Users/Admin/Downloads/eml_canonical.csv -o output_2.csv
(ersilia) kellen@DESKTOP-EQ55Q8H:~/ersilia$ ersilia -v api predict -i /mnt/c/Users/Admin/Downloads/eml_canonical.csv -o output_1.csv
21:56:57 | DEBUG | Getting session from /home/kellen/eos/session.json
21:56:57 | DEBUG | Getting session from /home/kellen/eos/session.json
21:56:57 | WARNING | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
21:56:57 | ERROR | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.
21:57:00 | DEBUG | Is fetched: True
21:57:00 | DEBUG | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:57:00 | DEBUG | Setting AutoService for eos5505
21:57:00 | INFO | Service class provided
21:57:00 | DEBUG | Using port 41927
21:57:00 | DEBUG | Starting Docker Daemon service
21:57:00 | DEBUG | Creating temporary folder /tmp/ersilia-fy5bi3ju and mounting as volume in container
21:57:00 | DEBUG | Image ersiliaos/eos5505:latest is available locally
21:57:00 | DEBUG | Using port 55271
21:57:00 | DEBUG | Starting Docker Daemon service
21:57:00 | DEBUG | Creating temporary folder /tmp/ersilia-heai5g4j and mounting as volume in container
21:57:00 | DEBUG | Reading card from eos5505
21:57:00 | DEBUG | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:57:02 | DEBUG | Reading shape from eos5505
21:57:02 | DEBUG | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:57:03 | DEBUG | Input Shape: Single
21:57:03 | DEBUG | Input type is: compound
21:57:03 | DEBUG | Input shape is: Single
21:57:03 | DEBUG | Importing module: .types.compound
21:57:03 | DEBUG | Checking RDKIT and other requirements necessary for compound inputs
21:57:03 | DEBUG | InputShapeSingle shape: Single
21:57:03 | DEBUG | Stopping sniffer for finding delimiter
21:57:03 | DEBUG | Expected number: 1
21:57:03 | DEBUG | Entity is list: False
21:57:03 | DEBUG | Resolving columns
21:57:03 | DEBUG | Stopping sniffer for resolving column types
21:57:03 | DEBUG | Done with sniffing the file
21:57:03 | DEBUG | Input: {1: 100, 2: 100}
21:57:03 | DEBUG | Key: {}
21:57:03 | DEBUG | Input: [1]
21:57:03 | DEBUG | Candidate header is ['drugs', 'smiles', 'can_smiles']
21:57:03 | DEBUG | Matching for input is [1]
21:57:03 | DEBUG | Has header True
21:57:03 | DEBUG | Schema {'input': [1], 'key': None}
21:57:03 | DEBUG | Standardizing input single
21:57:03 | DEBUG | Writing standardized input to /tmp/ersilia-fahfvvjs/standard_input_file.csv
21:57:03 | DEBUG | Reading standard file from /tmp/ersilia-fahfvvjs/standard_input_file.csv
21:57:03 | DEBUG | File has 443 lines
21:57:03 | DEBUG | No file splitting necessary!
21:57:03 | DEBUG | Reading card from eos5505
21:57:03 | DEBUG | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:57:05 | DEBUG | Reading shape from eos5505
21:57:05 | DEBUG | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:57:06 | DEBUG | Input Shape: Single
21:57:06 | DEBUG | Input type is: compound
21:57:06 | DEBUG | Input shape is: Single
21:57:06 | DEBUG | Importing module: .types.compound
21:57:06 | DEBUG | Checking RDKIT and other requirements necessary for compound inputs
21:57:06 | DEBUG | InputShapeSingle shape: Single
21:57:06 | DEBUG | API eos5505:predict initialized at URL http://0.0.0.0:41705
21:57:06 | DEBUG | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:57:06 | DEBUG | Posting to predict
21:57:06 | DEBUG | Batch size 100
21:57:06 | DEBUG | Stopping sniffer for finding delimiter
21:57:06 | DEBUG | Expected number: 1
21:57:06 | DEBUG | Entity is list: False
21:57:06 | DEBUG | Resolving columns
21:57:06 | DEBUG | Stopping sniffer for resolving column types
21:57:06 | DEBUG | Done with sniffing the file
21:57:06 | DEBUG | Input: {1: 100, 2: 100}
21:57:06 | DEBUG | Key: {}
21:57:06 | DEBUG | Input: [1]
21:57:06 | DEBUG | Candidate header is ['drugs', 'smiles', 'can_smiles']
21:57:06 | DEBUG | Matching for input is [1]
21:57:06 | DEBUG | Has header True
21:57:06 | DEBUG | Schema {'input': [1], 'key': None}
21:57:06 | DEBUG | Standardizing input single
21:57:06 | DEBUG | Writing standardized input to /tmp/ersilia-wq5abh_l/standard_input_file.csv
21:57:06 | DEBUG | Reading standard file from /tmp/ersilia-wq5abh_l/standard_input_file.csv
21:57:06 | DEBUG | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:57:11 | DEBUG | Status code: 200
21:57:11 | DEBUG | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:57:17 | DEBUG | Status code: 200
21:57:22 | DEBUG | Status code: 200
21:57:27 | DEBUG | Status code: 200
21:57:32 | DEBUG | Status code: 200
21:57:32 | DEBUG | Done with unique posting
21:57:34 | DEBUG | Data: outcome
21:57:34 | DEBUG | Values: [0.049]
21:57:34 | DEBUG | Datatype: numeric_array
output_1.csv
The predictions were output_1.csv
And for Rat Liver Microsomal Stability
ersilia -v fetch eos5505
Serve the model
ersilia -v serve eos5505
And the output was
Serving model eos5505: ncats-rlm
URL: http://0.0.0.0:41705
PID: -1
SRV: pulled_docker
👉 To run model:
run
These APIs are also valid:
💁 Information:
info
4. Run predictions and store the output in a file
ersilia -v api predict -i /mnt/c/Users/Admin/Downloads/eml_canonical.csv -o output_2.csv
(ersilia) kellen@DESKTOP-EQ55Q8H:~/ersilia$ ersilia -v api predict -i /mnt/c/Users/Admin/Downloads/eml_canonical.csv -o output_2.csv
21:44:11 | DEBUG | Getting session from /home/kellen/eos/session.json
21:44:11 | DEBUG | Getting session from /home/kellen/eos/session.json
21:44:11 | WARNING | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
21:44:11 | ERROR | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.
21:44:14 | DEBUG | Is fetched: True
21:44:14 | DEBUG | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:44:14 | DEBUG | Setting AutoService for eos5505
21:44:14 | INFO | Service class provided
21:44:14 | DEBUG | Using port 47395
21:44:14 | DEBUG | Starting Docker Daemon service
21:44:14 | DEBUG | Creating temporary folder /tmp/ersilia-fozp9pgv and mounting as volume in container
21:44:14 | DEBUG | Image ersiliaos/eos5505:latest is available locally
21:44:14 | DEBUG | Using port 50889
21:44:14 | DEBUG | Starting Docker Daemon service
21:44:14 | DEBUG | Creating temporary folder /tmp/ersilia-53hg2jig and mounting as volume in container
21:44:14 | DEBUG | Reading card from eos5505
21:44:14 | DEBUG | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:44:16 | DEBUG | Reading shape from eos5505
21:44:16 | DEBUG | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:44:17 | DEBUG | Input Shape: Single
21:44:17 | DEBUG | Input type is: compound
21:44:17 | DEBUG | Input shape is: Single
21:44:17 | DEBUG | Importing module: .types.compound
21:44:17 | DEBUG | Checking RDKIT and other requirements necessary for compound inputs
21:44:17 | DEBUG | InputShapeSingle shape: Single
21:44:17 | DEBUG | Stopping sniffer for finding delimiter
21:44:17 | DEBUG | Expected number: 1
21:44:17 | DEBUG | Entity is list: False
21:44:17 | DEBUG | Resolving columns
21:44:17 | DEBUG | Stopping sniffer for resolving column types
21:44:17 | DEBUG | Done with sniffing the file
21:44:17 | DEBUG | Input: {1: 100, 2: 100}
21:44:17 | DEBUG | Key: {}
21:44:17 | DEBUG | Input: [1]
21:44:17 | DEBUG | Candidate header is ['drugs', 'smiles', 'can_smiles']
21:44:17 | DEBUG | Matching for input is [1]
21:44:17 | DEBUG | Has header True
21:44:17 | DEBUG | Schema {'input': [1], 'key': None}
21:44:17 | DEBUG | Standardizing input single
21:44:17 | DEBUG | Writing standardized input to /tmp/ersilia-7i7ahwyp/standard_input_file.csv
21:44:17 | DEBUG | Reading standard file from /tmp/ersilia-7i7ahwyp/standard_input_file.csv
21:44:17 | DEBUG | File has 443 lines
21:44:17 | DEBUG | No file splitting necessary!
21:44:17 | DEBUG | Reading card from eos5505
21:44:17 | DEBUG | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:44:19 | DEBUG | Reading shape from eos5505
21:44:19 | DEBUG | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:44:20 | DEBUG | Input Shape: Single
21:44:20 | DEBUG | Input type is: compound
21:44:20 | DEBUG | Input shape is: Single
21:44:20 | DEBUG | Importing module: .types.compound
21:44:20 | DEBUG | Checking RDKIT and other requirements necessary for compound inputs
21:44:20 | DEBUG | InputShapeSingle shape: Single
21:44:20 | DEBUG | API eos5505:predict initialized at URL http://0.0.0.0:41705
21:44:20 | DEBUG | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:44:20 | DEBUG | Posting to predict
21:44:20 | DEBUG | Batch size 100
21:44:20 | DEBUG | Stopping sniffer for finding delimiter
21:44:20 | DEBUG | Expected number: 1
21:44:20 | DEBUG | Entity is list: False
21:44:20 | DEBUG | Resolving columns
21:44:20 | DEBUG | Stopping sniffer for resolving column types
21:44:20 | DEBUG | Done with sniffing the file
21:44:20 | DEBUG | Input: {1: 100, 2: 100}
21:44:20 | DEBUG | Key: {}
21:44:20 | DEBUG | Input: [1]
21:44:20 | DEBUG | Candidate header is ['drugs', 'smiles', 'can_smiles']
21:44:20 | DEBUG | Matching for input is [1]
21:44:20 | DEBUG | Has header True
21:44:20 | DEBUG | Schema {'input': [1], 'key': None}
21:44:20 | DEBUG | Standardizing input single
21:44:20 | DEBUG | Writing standardized input to /tmp/ersilia-6x2vc0s_/standard_input_file.csv
21:44:20 | DEBUG | Reading standard file from /tmp/ersilia-6x2vc0s_/standard_input_file.csv
21:44:21 | DEBUG | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:44:29 | DEBUG | Status code: 200
21:44:29 | DEBUG | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:44:33 | DEBUG | Status code: 200
21:44:38 | DEBUG | Status code: 200
21:44:43 | DEBUG | Status code: 200
21:44:48 | DEBUG | Status code: 200
21:44:48 | DEBUG | Done with unique posting
21:44:51 | DEBUG | Data: outcome
21:44:51 | DEBUG | Values: [0.049]
21:44:51 | DEBUG | Datatype: numeric_array
output_2.csv
The predictions were output_2.csv
Output of Rat Liver Microsomal Stability Predictions of the original model(screenshot above) using the original code match with Ersilia predictions( csv file above). Though the decimal point differ , the class which each molecule fall into( stable or unstable) is the same. same goes for Human Liver Cytosolic Stability model predictions.
I was able to
Drug-Drug Interactions(DDIs) is a situation or a circumstance in which a patient takes two or more drugs simultaneously and one of administered drug alter the Pharmalogical or Clinical Responses of another drug. Such drug reactions may result to adverse negative side effects which sometimes may be life threatening or alter the drug effectiveness.
KGCN_NFM is a deep learning framework that combines the knowledge of graph convolutional networks (KGCNs) and Neural factorization machines (NFMs) to predict DDIs.
Ersilia mission to aid in drug discovery will lead to new drugs being introduced into the market. Carrying out predictions on how this drugs will interact with other drugs, or how existing drugs interact will prevent adverse side affects of some Drug-Drug Interactions.
The codes are readily available and also the dataset.
Binding affinity indicates the strength of drug-target interactions. Successful identification or prediction of drug-target interactions(DTI) enables researchers to understand and predict how drugs interact with their target protein or biological molecules. This will help in new drug discovery or repurposing existing ones by targeting a specifics proteins.
DeepGS uses deep neural network to extract local chemical context from amino acids and SMILES sequences, as well as the molecular structure from the drugs and then predict the drug-target Binding affinity.
DeepGS framework will aid in predicting how drugs interact with various targets, this will help aid ersilia in new drug discovery of neglected diseases by tailoring treatments based on individual patient genes and biological molecules. .
Create a new environment
conda create -n deepgs python=3.7.6
source activate deepgs
Install Pytorch, RDKit and pytorch-geometric.
Git clone their repository
git clone https://github.com/jacklin18/DeepGS.git
move in the cloned repository and install the requirements
cd DeepGS
pip install -r requirements.txt
Provide training data with each row containing a molecule(SMILES strings) , a protein sequence(amino acids) and a label between the drug-target pair(binding affinity value) example
CC1=C2C=C(C=CC2=NN1)C3=CC(=CN=C3)OCC(CC4=CC=CC=C4)N MKKFFDSRREQGGSGLGSGSSGGGGSTSGLGSGYIGRVFGIGRQQVTVDEVLAEGGFAIVFLVRTSNGMKCALKRMFVNNEHDLQVCKREIQIMRDLSGHKNIVGYIDSSINNVSSGDVWEVLILMDFCRGGQVVNLMNQRLQTGFTENEVLQIFCDTCEAVARLHQCKTPIIHRDLKVENILLHDRGHYVLCDFGSATNKFQNPQTEGVNAVEDEIKKYTTLSYRAPEMVNLYSGKIITTKADIWALGCLLYKLCYFTLPFGESQVAICDGNFTIPDNSRYSQDMHCLIRYMLEPDPDKRPDIYQVSYFSFKLLKKECPIPNVQNSPIPAKLPEPVKASEAAAKKTQPKARLTDPIPTTETSIAPRQRPKAGQTQPNPGILPIQPALTPRKRATVQPPPQAAGSSNQPGLLASVPQPKPQAPPSQPLPQTQAKQPQAPPTPQQTPSTQAQGLPAQAQATPQHQQQLFLKQQQQQQQPPPAQQQPAGTFYQQQQAQTQQFQAVHPATQKPAIAQFPVVSQGGSQQQLMQNFYQQQQQQQQQQQQQQLATALHQQQLMTQQAALQQKPTMAAGQQPQPQPAAAPQPAPAQEPAIQAPVRQQPKVQTTPPPAVQGQKVGSLTPPSSPKTQRAGHRRILSDVTHSAVFGVPASKSTQLLQAAAAEASLNKSKSATTTPSGSPRTSQQNVYNPSEGSTWNPFDDDNFSKLTAEELLNKDFAKLGEGKHPEKLGGSAESLIPGFQSTQGDAFATTSFSAGTAEKRKGGQTVDSGLPLLSVSDPFIPLQVPDAPEKLIEGLKSPDTSLLLPDLLPMTDPFGSTSDAVIEKADVAVESLIPGLEPPVPQRLPSQTESVTSNRTDSLTGEDSLLDCSLLSNPTTDLLEEFAPTAISAPVHKAAEDSNLISGFDVPEGSDKVAEDEFDPIPVLITKNPQGGHSRNSSGSSESSLPNLARSLLLVDQLIDL 43.0
Usage
cd code
sh/bash preprocess.sh
-train the model
sh/bash run_tranining.sh
After doing predictions on PAMPA pH 7.4 when working with NCATS_ADME I came across(read) on permeability of drugs across cell membranes. And as I was reading I found a similar model which predicts logP and logD descriptors.
StructGNN predicts logP and logD descriptors by "encoding additional graph information by extracting molecular substructures through adding a set of generalized atomic features of these substructures to an established Direct Message Passing Neural Network (D-MPNN)."
StructGNN is an complementary additional model to the models available in ersilia hub for logP and logD prediction in drugs permeability against cell membranes which expands toolbox of models available for Lipophilicity Prediction
git clone https://github.com//VEK239/StructGNN-lipophilicity.git
git checkout SOTA
cd scripts/SOTA/dmpnn
conda env create -f environment.yml
conda activate chemprop
pip install -e .
conda create -n mol_ot python=3.6.8
sudo apt-get install libxrender1
conda install pytorch torchvision -c pytorch
conda install -c rdkit rdkit
conda install -c conda-forge pot
conda install -c anaconda scikit-learn
conda install -c conda-forge matplotlib
conda install -c conda-forge tqdm
conda install -c conda-forge tensorboardx
Hello,
Thanks for your work during the Outreachy contribution period, we hope you enjoyed it! We will now close this issue while we work on the selection of interns. Thanks again!
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application