kellenkinya commented 9 months ago

Week 1 - Get to know the community

[X] Join the communication channels
[X] Open a GitHub issue (this one!)
[x] Install the Ersilia Model Hub and test the simplest model
[x] Write a motivation statement to work at Ersilia
[x] Submit your first contribution to the Outreachy site

Week 2 - Install and run an ML model

[x] Select a model from the suggested list
[x] Install the model in your system
[x] Run predictions for the EML
[x] Compare results with the Ersilia Model Hub implementation!
[x] Install and run Docker!

Week 3 - Propose new models

[x] Suggest a new model and document it (1)
[x] Suggest a new model and document it (2)
[x] Suggest a new model and document it (3)

Week 4 - Prepare your final application

[x] Submit the final application in the Outreachy website

kellenkinya commented 9 months ago

Task 1 : Joined the community and introduced myself to my peers and the mentors. Task 2 : Opened a github issue Task 3 : My operating system is windows 10 so I had to first install WSL in PowerShell running as an administrator , then downloaded and installed ubuntu on windows, the next step was to install all the third part prerequisites needed for the ersilia model to run found in this link .

kellenkinya commented 9 months ago

Task 3: Able to download and install the models locally through the fetch command. Once this step is done the model is ready for use. then I served the model and made both individual and multiple predictions.

kellenkinya commented 9 months ago

What I learnt on the predictions: The higher the score, the more synthetically accessible the molecule is predicted to be. This journal by PubMed central gave me some more information on the SA scores.

kellenkinya commented 9 months ago

Unable to make predictions today, I was running into object of NoneType has no len error as discussed in the slack channel

Steps I took to solve

Uninstalled conda.
switched from python 3.7 to python 3.10 to see if it will fix the issue
Removed the initial ersilia directory I had cloned since I had modified the code in the repo rm -rf ersilia git clone https://github.com/ersilia-os/ersilia.git
Created a new conda environment and activated it. conda create -n ersilia python=3.10

conda activate ersilia

kellenkinya commented 9 months ago

The adjustments worked and I was able to

Check all the available models in the website ersilia catalog
Fetch different models and installed them locally using the fetch command using the slug or ersilia identifier
Serve the model
Make several predictions by either running single models or multiple models in batch mode.

One of the predictions :

kellenkinya commented 9 months ago

Motivation to work in ersilia should have:

Current skills -Your reasons to work in ersilia project -How this would advance my career -What are my plans during and after the internship *

kellenkinya commented 9 months ago

Why I want to work at ersilia After getting an approval than my initial outreachy application has been approved and the next step was to find a project or project that I am interested in. I had to take sometime to go through the list, read what all the organizations stand for and what they do. In the end ersilia was it for me. A non-profit organization with a mission to equip laboratories in Low and Middle Income Countries with state of the art AI/ML tools for infectious and neglected disease research. Their goals, mission and vision resonates with me fully. Getting a chance to intern at ersilia gives me an opportunity to do good for the humanity, that what I do or assist in as an impact in the world, it helps people, especially from lower income countries, I come from a lower middle income income country, Kenya and I hope in the long run ersilia grows and spreads to Kenya.

I graduated last year with a degree in Actuarial Science , then ventured into machine learning and data science using python, most of the projects I have worked on in ML are about analyzing sales, or revenues or prices of items, but to see real application of ML in biomedical and experimental research transcends it all and all I can think of is finally I found my purpose and my path. I want to apply my data science skills in medical research, intern, volunteer or work on medical research institutes, disease and discovery centers, do open source contributions in disease and drugs research and one day when my time comes to leave this world let them that remain and know me say She gave her all in disease and drugs research to help the people and that's how we make living more beautiful and help reduce peoples pain.

That's why I would like to work with ersilia, A opportunity for the internship will:

Allow me to learn, unlearn, relearn and apply in AI/ML medical research.
Contribute new models to ersilia model hub.
Learn from all the mentors and my peers, also establish new friendships and connections. I hope to be given an opportunity, I would love to part of ersilia organization . Thank you

kellenkinya commented 9 months ago

Week 1 rundown

The mentors and peers are very nice and helpful. The YouTube video on the community call call was very informative and educative.
Ersilia model hub has 4 models categories (ranked in order of increasing complexity)

Pre-trained literature models
Re-trained literature models
In-house models trained on relevant datasets
Models built on collaboration

Learn't how to use ubuntu on windows, GIT and GitHub CLI and installing dependencies.
Installed and run several simple models, and made predictions,

kellenkinya commented 9 months ago

Week 2 tasks

Task 1: Select a model

NCATS Rat Liver Microsomal Stability

Why NCATS Rat Liver Microsomal Stability

Modelling methods used are neural networks(deep neural networks, recurrent neural networks and graph convolutional neural networks) which is a field in machine learning and artificial intelligence I am deeply interested in.
It has well documented journals which explain the process, outputs and interpretations (gives more detailed knowledge on how, why and what's happening) hence I can follow through and somehow understand the concepts journal

kellenkinya commented 9 months ago

Was running into a timeout error when trying to run the application on ADMW@NCATS model.

 File "/home/kellen/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 430, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

failed

kellenkinya commented 9 months ago

The error was caused by slow internet connection

What I did

Closed the terminal, gave it some time and retried the steps again

carcablop commented 9 months ago

Hi @kellenkinya Welcome to Ersilia!. Thank you very much for all the effort. Regarding the error when you try to get the predictions, we are working on solving the problem, we have found that the isaura package is causing conflict with ersilia, so please do not install it.

carcablop commented 9 months ago

The adjustments worked and I was able to

1. Check all the available models in the website `ersilia catalog`

2. Fetch different models and  installed them locally using the **fetch command**  using the **slug** or **ersilia identifier**

3. Serve the model

4. Make several predictions by either running single models or multiple models in batch mode.

One of the predictions : 2023-10-12 (4)

Please, here you must be within the created environment of 'ersilia'. In the shared image you are working on the base conda environment, not on the ersilia environment that you had created in the previous steps. Follow the steps to run a basic ersilia model, do not attach images, please attach only the output logs when you run a model like this, all within the created ersilia environment: ersilia -v fetch eos3b5e > my.log 2>&1 ersilia -v serve eos3b5e ersilia -v run -i "CCC" > my.log 2>&1

Please make sure you execute the steps correctly before continuing with the following tasks. Thank you

kellenkinya commented 9 months ago

Thank you for the corrections.

kellenkinya commented 9 months ago

I unistalled isaura, then conda activated ersilia and ran the basic ersilia model, attached is my log

my.log

DhanshreeA commented 9 months ago

What I learnt on the predictions: The higher the score, the more synthetically accessible the molecule is predicted to be. This journal by PubMed central gave me some more information on the SA scores.

Hi @kellenkinya normally it is a good practice to mention the model you've used. I'm assuming this is the mode eos2r5a (Retrosynthetic accessibility score)

DhanshreeA commented 9 months ago

Was running into a timeout error when trying to run the application on ADMW@NCATS model.

 File "/home/kellen/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 430, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

failed

@kellenkinya Did your restarting of the terminal work, or is this still and issue? Quick piece of advice, it's often helpful to share the entire stack trace for better insights into the error.

kellenkinya commented 9 months ago

What I learnt on the predictions: The higher the score, the more synthetically accessible the molecule is predicted to be. This journal by PubMed central gave me some more information on the SA scores.

Hi @kellenkinya normally it is a good practice to mention the model you've used. I'm assuming this is the mode eos2r5a (Retrosynthetic accessibility score

Yes it was

DhanshreeA commented 9 months ago

Was running into a timeout error when trying to run the application on ADMW@NCATS model.
 File "/home/kellen/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 430, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

failed
@kellenkinya Did your restarting of the terminal work, or is this still and issue? Quick piece of advice, it's often helpful to share the entire stack trace for better insights into the error.

Hi @kellenkinya just following up, any updates here?

kellenkinya commented 9 months ago

Sorry for a late update.

First I cloned the repo git clone --recursive https://github.com/ncats/ncats-adme.git
Then I was able to set up my environment by running conda env create --prefix ./env -f environment.yml and then typed `pip install typed-argument-parser'
The next step was to change the working environment into the server directory which is inside the ADME_RLM repo then type python app.py to install the model into the server.

kellenkinya commented 9 months ago

The 3rd step is taking sometime on my end , It ran for many hours yesterday and I had some power issues so it stopped and I have to run it again today, I am hoping to have managed this step before the day end and made predictions

kellenkinya commented 9 months ago

`(C:\Users\Admin\ncats-adme\server\env) C:\Users\Admin\ncats-adme\server>python app.py
Loading RLM graph convolutional neural network model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading RLM model files
Loading PAMPA graph convolutional neural network model
Model File Exists Locally
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading PAMPA 7.4 models
Loading PAMPA graph convolutional neural network model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading PAMPA 5.0 models
Loading PAMPA BBB graph convolutional neural network model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading PAMPA BBB models
Loading Solubility graph convolutional neural network model
Model File Exists Locally
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading Solubility models
Loading human liver cytosol stability random forest models
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 75.50it/s]
Finished loading human liver cytosol stability models
Loading CYP450 random forest models
100%|██████████████████████████████████████████████████████████████████████████████████| 64/64 [00:02<00:00, 28.23it/s]
cyp2c9_subs-model_39: 100%|███████████████████████████████████████████████████████| 18.6M/18.6M [00:00<00:00, 2.16GB/s]
cyp2c9_subs-model_40: 100%|████████████████████████████████████████████████████████| 18.3M/18.3M [00:00<00:00, 641MB/s]
cyp2c9_subs-model_41: 100%|███████████████████████████████████████████████████████| 18.6M/18.6M [00:00<00:00, 2.44GB/s]
cyp2c9_subs-model_42: 100%|████████████████████████████████████████████████████████| 18.3M/18.3M [00:00<00:00, 675MB/s]
cyp2c9_subs-model_43: 100%|███████████████████████████████████████████████████████| 18.4M/18.4M [00:00<00:00, 2.41GB/s]
 69%|████████████████████████████████████████████████████████▍                         | 44/64 [08:42<03:57, 11.88s/it]
 17%|█████████████▊                                                                     | 1/6 [08:44<43:44, 524.98s/it]
Traceback (most recent call last):
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 710, in _error_catcher
    yield
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 835, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(9322106 bytes read, 10090047 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 936, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 907, in read
    data = self._raw_read(amt)
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 835, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "C:\Users\Admin\ncats-adme\server\env\lib\contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\urllib3\response.py", line 727, in _error_catcher
    raise ProtocolError(f"Connection broken: {e!r}", e) from e
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(9322106 bytes read, 10090047 more expected)', IncompleteRead(9322106 bytes read, 10090047 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "app.py", line 27, in <module>
    from predictors.cyp450.cyp450_predictor import CYP450Predictor
  File "C:\Users\Admin\ncats-adme\server\predictors\cyp450\__init__.py", line 87, in <module>
    cyp450_models_dict = load_models()
  File "C:\Users\Admin\ncats-adme\server\predictors\cyp450\__init__.py", line 81, in load_models
    cyp450_models_dict[model_name][f'model_{model_number}'] = download_file(base_url, model_name, model_number, cyp450_models_dict)
  File "C:\Users\Admin\ncats-adme\server\predictors\cyp450\__init__.py", line 24, in download_file
    cyp450_rf_pkl_file_request = requests.get(cyp450_rf_pkl_url)
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\sessions.py", line 747, in send
    r.content
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\models.py", line 899, in content
    self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
  File "C:\Users\Admin\ncats-adme\server\env\lib\site-packages\requests\models.py", line 818, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(9322106 bytes read, 10090047 more expected)', IncompleteRead(9322106 bytes read, 10090047 more `expected))`

kellenkinya commented 9 months ago

I Encountered the above error, After some research I found at the error might be due to

Network issue: the script is unable to complete the download. I have closed the terminal and checked my connections, let me now retry python app.py and will document how it goes. Any other help on how to solve is highly appreciated.

kellenkinya commented 9 months ago

After running python app.py overnight and still not able to install all the models(was running for more than 30 hours) into the server I decided to comment on CYP450 isozymes - CYP2C9, CYP2D6, CYP3A4 models which were the ones taking long. Steps I took

Changed the working directory to ncat_adme then into the server
Opened app.py notepad app.py and commented on the importation #from predictors.cyp450.cyp450_predictor import CYP450Predictor and also the def predict_df for the model.

kellenkinya commented 9 months ago

I Encountered the above error, After some research I found at the error might be due to

Network issue: the script is unable to complete the download. I have closed the terminal and checked my connections, let me now retry python app.py and will document how it goes. Any other help on how to solve is highly appreciated.

It worked

kellenkinya commented 9 months ago

Running ADME@NCATS

Task 2: Installing the model

Steps I took

Clone the github repository that contain the codes for the application and the models
```
git clone --recursive https://github.com/ncats/ncats-adme.git
```
Change the working directory to ncats_adme, then within the directory cd into the server
```
cd ncats_adme
cd server
```

Create and environment. (my OS is windows )

conda env create --prefix ./env -f environment.yml
pip install typed-argument-parser

Then type
```
python app.py
```

kellenkinya commented 9 months ago

Task 3 Run Predictions

After all the models have been loaded into NCATS server to assess them I opened my chrome browser and typed http://127.0.0.1:5000/

kellenkinya commented 9 months ago

I then navigated to predict and chose PAMPA pH 7.4 and PAMPA pH 5 models. Then uploaded the Essential Medicine List csv file which I had previously downloaded from ersilia repo and the processed the file.

PAMPA pH 7.4 model prediction

Interpretations.

PAMPA pH 7.4 model is a classification model with predictions of 1 having Low Permeability and prediction of 0 having high or moderate permeability

kellenkinya commented 9 months ago

PAMPA pH 5 model prediction

Interpretations.

PAMPA pH 5.0 model is a classification model with predictions of 1 having Low Permeability and prediction of 0 having high or moderate permeability

kellenkinya commented 9 months ago

Rat Liver Microsomal Stability Predictions

Human Liver Cytosolic Stability Predictions

Interpretations

Both models are classification models with prediction of 0 meaning the molecules are stable and prediction of 1 meaning molecules are unstable

kellenkinya commented 9 months ago

Task 4: Understanding Ersilia backend.

Ersilia runs by

Downloading models from githun(using GIT-LFS)
From S3 buckets(AWS backend)
Downloading models as docker containers. I used docker and followed these sets of instructions Installing docker - followed this steps https://3os.org/devops/docker/docker-install/

kellenkinya commented 8 months ago

Task 5: Compare results with ersilia models.

ON https://ersilia.io/model-hub, I filtered using Microsomal stability and found Human Liver Microsomal Stability. Then from github I found its EOS model ID.

Fetch the model using EOS model ID
```
ersilia -v fetch eos31ve
```

Serve the model

ersilia -v serve  eos31ve

And the output was


🚀 Serving model eos31ve: ncats-hlm

URL: http://0.0.0.0:56169
PID: -1
SRV: pulled_docker

👉 To run model:

run

💁 Information:

info
```
4. Run predictions and store the output in a file
```
ersilia -v api predict -i /mnt/c/Users/Admin/Downloads/eml_canonical.csv -o output_2.csv

(ersilia) kellen@DESKTOP-EQ55Q8H:~/ersilia$ ersilia -v api predict -i /mnt/c/Users/Admin/Downloads/eml_canonical.csv -o output_1.csv
21:56:57 | DEBUG    | Getting session from /home/kellen/eos/session.json
21:56:57 | DEBUG    | Getting session from /home/kellen/eos/session.json
21:56:57 | WARNING  | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
21:56:57 | ERROR    | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.
21:57:00 | DEBUG    | Is fetched: True
21:57:00 | DEBUG    | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:57:00 | DEBUG    | Setting AutoService for eos5505
21:57:00 | INFO     | Service class provided
21:57:00 | DEBUG    | Using port 41927
21:57:00 | DEBUG    | Starting Docker Daemon service
21:57:00 | DEBUG    | Creating temporary folder /tmp/ersilia-fy5bi3ju and mounting as volume in container
21:57:00 | DEBUG    | Image ersiliaos/eos5505:latest is available locally
21:57:00 | DEBUG    | Using port 55271
21:57:00 | DEBUG    | Starting Docker Daemon service
21:57:00 | DEBUG    | Creating temporary folder /tmp/ersilia-heai5g4j and mounting as volume in container
21:57:00 | DEBUG    | Reading card from eos5505
21:57:00 | DEBUG    | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:57:02 | DEBUG    | Reading shape from eos5505
21:57:02 | DEBUG    | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:57:03 | DEBUG    | Input Shape: Single
21:57:03 | DEBUG    | Input type is: compound
21:57:03 | DEBUG    | Input shape is: Single
21:57:03 | DEBUG    | Importing module: .types.compound
21:57:03 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
21:57:03 | DEBUG    | InputShapeSingle shape: Single
21:57:03 | DEBUG    | Stopping sniffer for finding delimiter
21:57:03 | DEBUG    | Expected number: 1
21:57:03 | DEBUG    | Entity is list: False
21:57:03 | DEBUG    | Resolving columns
21:57:03 | DEBUG    | Stopping sniffer for resolving column types
21:57:03 | DEBUG    | Done with sniffing the file
21:57:03 | DEBUG    | Input: {1: 100, 2: 100}
21:57:03 | DEBUG    | Key: {}
21:57:03 | DEBUG    | Input: [1]
21:57:03 | DEBUG    | Candidate header is ['drugs', 'smiles', 'can_smiles']
21:57:03 | DEBUG    | Matching for input is [1]
21:57:03 | DEBUG    | Has header True
21:57:03 | DEBUG    | Schema {'input': [1], 'key': None}
21:57:03 | DEBUG    | Standardizing input single
21:57:03 | DEBUG    | Writing standardized input to /tmp/ersilia-fahfvvjs/standard_input_file.csv
21:57:03 | DEBUG    | Reading standard file from /tmp/ersilia-fahfvvjs/standard_input_file.csv
21:57:03 | DEBUG    | File has 443 lines
21:57:03 | DEBUG    | No file splitting necessary!
21:57:03 | DEBUG    | Reading card from eos5505
21:57:03 | DEBUG    | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:57:05 | DEBUG    | Reading shape from eos5505
21:57:05 | DEBUG    | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:57:06 | DEBUG    | Input Shape: Single
21:57:06 | DEBUG    | Input type is: compound
21:57:06 | DEBUG    | Input shape is: Single
21:57:06 | DEBUG    | Importing module: .types.compound
21:57:06 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
21:57:06 | DEBUG    | InputShapeSingle shape: Single
21:57:06 | DEBUG    | API eos5505:predict initialized at URL http://0.0.0.0:41705
21:57:06 | DEBUG    | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:57:06 | DEBUG    | Posting to predict
21:57:06 | DEBUG    | Batch size 100
21:57:06 | DEBUG    | Stopping sniffer for finding delimiter
21:57:06 | DEBUG    | Expected number: 1
21:57:06 | DEBUG    | Entity is list: False
21:57:06 | DEBUG    | Resolving columns
21:57:06 | DEBUG    | Stopping sniffer for resolving column types
21:57:06 | DEBUG    | Done with sniffing the file
21:57:06 | DEBUG    | Input: {1: 100, 2: 100}
21:57:06 | DEBUG    | Key: {}
21:57:06 | DEBUG    | Input: [1]
21:57:06 | DEBUG    | Candidate header is ['drugs', 'smiles', 'can_smiles']
21:57:06 | DEBUG    | Matching for input is [1]
21:57:06 | DEBUG    | Has header True
21:57:06 | DEBUG    | Schema {'input': [1], 'key': None}
21:57:06 | DEBUG    | Standardizing input single
21:57:06 | DEBUG    | Writing standardized input to /tmp/ersilia-wq5abh_l/standard_input_file.csv
21:57:06 | DEBUG    | Reading standard file from /tmp/ersilia-wq5abh_l/standard_input_file.csv
21:57:06 | DEBUG    | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:57:11 | DEBUG    | Status code: 200
21:57:11 | DEBUG    | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:57:17 | DEBUG    | Status code: 200
21:57:22 | DEBUG    | Status code: 200
21:57:27 | DEBUG    | Status code: 200
21:57:32 | DEBUG    | Status code: 200
21:57:32 | DEBUG    | Done with unique posting
21:57:34 | DEBUG    | Data: outcome
21:57:34 | DEBUG    | Values: [0.049]
21:57:34 | DEBUG    | Datatype: numeric_array
output_1.csv

The predictions were output_1.csv

kellenkinya commented 8 months ago

And for Rat Liver Microsomal Stability

Fetch the model using EOS model ID
```
ersilia -v fetch eos5505
```

Serve the model

ersilia -v serve  eos5505

And the output was


Serving model eos5505: ncats-rlm

URL: http://0.0.0.0:41705
PID: -1
SRV: pulled_docker

👉 To run model:

run

These APIs are also valid:
predict

💁 Information:

info
```
4. Run predictions and store the output in a file
```
ersilia -v api predict -i /mnt/c/Users/Admin/Downloads/eml_canonical.csv -o output_2.csv

(ersilia) kellen@DESKTOP-EQ55Q8H:~/ersilia$ ersilia -v api predict -i /mnt/c/Users/Admin/Downloads/eml_canonical.csv -o output_2.csv
21:44:11 | DEBUG    | Getting session from /home/kellen/eos/session.json
21:44:11 | DEBUG    | Getting session from /home/kellen/eos/session.json
21:44:11 | WARNING  | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently
21:44:11 | ERROR    | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately.
21:44:14 | DEBUG    | Is fetched: True
21:44:14 | DEBUG    | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:44:14 | DEBUG    | Setting AutoService for eos5505
21:44:14 | INFO     | Service class provided
21:44:14 | DEBUG    | Using port 47395
21:44:14 | DEBUG    | Starting Docker Daemon service
21:44:14 | DEBUG    | Creating temporary folder /tmp/ersilia-fozp9pgv and mounting as volume in container
21:44:14 | DEBUG    | Image ersiliaos/eos5505:latest is available locally
21:44:14 | DEBUG    | Using port 50889
21:44:14 | DEBUG    | Starting Docker Daemon service
21:44:14 | DEBUG    | Creating temporary folder /tmp/ersilia-53hg2jig and mounting as volume in container
21:44:14 | DEBUG    | Reading card from eos5505
21:44:14 | DEBUG    | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:44:16 | DEBUG    | Reading shape from eos5505
21:44:16 | DEBUG    | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:44:17 | DEBUG    | Input Shape: Single
21:44:17 | DEBUG    | Input type is: compound
21:44:17 | DEBUG    | Input shape is: Single
21:44:17 | DEBUG    | Importing module: .types.compound
21:44:17 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
21:44:17 | DEBUG    | InputShapeSingle shape: Single
21:44:17 | DEBUG    | Stopping sniffer for finding delimiter
21:44:17 | DEBUG    | Expected number: 1
21:44:17 | DEBUG    | Entity is list: False
21:44:17 | DEBUG    | Resolving columns
21:44:17 | DEBUG    | Stopping sniffer for resolving column types
21:44:17 | DEBUG    | Done with sniffing the file
21:44:17 | DEBUG    | Input: {1: 100, 2: 100}
21:44:17 | DEBUG    | Key: {}
21:44:17 | DEBUG    | Input: [1]
21:44:17 | DEBUG    | Candidate header is ['drugs', 'smiles', 'can_smiles']
21:44:17 | DEBUG    | Matching for input is [1]
21:44:17 | DEBUG    | Has header True
21:44:17 | DEBUG    | Schema {'input': [1], 'key': None}
21:44:17 | DEBUG    | Standardizing input single
21:44:17 | DEBUG    | Writing standardized input to /tmp/ersilia-7i7ahwyp/standard_input_file.csv
21:44:17 | DEBUG    | Reading standard file from /tmp/ersilia-7i7ahwyp/standard_input_file.csv
21:44:17 | DEBUG    | File has 443 lines
21:44:17 | DEBUG    | No file splitting necessary!
21:44:17 | DEBUG    | Reading card from eos5505
21:44:17 | DEBUG    | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:44:19 | DEBUG    | Reading shape from eos5505
21:44:19 | DEBUG    | Trying to get metadata from: /home/kellen/eos/dest/eos5505
21:44:20 | DEBUG    | Input Shape: Single
21:44:20 | DEBUG    | Input type is: compound
21:44:20 | DEBUG    | Input shape is: Single
21:44:20 | DEBUG    | Importing module: .types.compound
21:44:20 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
21:44:20 | DEBUG    | InputShapeSingle shape: Single
21:44:20 | DEBUG    | API eos5505:predict initialized at URL http://0.0.0.0:41705
21:44:20 | DEBUG    | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:44:20 | DEBUG    | Posting to predict
21:44:20 | DEBUG    | Batch size 100
21:44:20 | DEBUG    | Stopping sniffer for finding delimiter
21:44:20 | DEBUG    | Expected number: 1
21:44:20 | DEBUG    | Entity is list: False
21:44:20 | DEBUG    | Resolving columns
21:44:20 | DEBUG    | Stopping sniffer for resolving column types
21:44:20 | DEBUG    | Done with sniffing the file
21:44:20 | DEBUG    | Input: {1: 100, 2: 100}
21:44:20 | DEBUG    | Key: {}
21:44:20 | DEBUG    | Input: [1]
21:44:20 | DEBUG    | Candidate header is ['drugs', 'smiles', 'can_smiles']
21:44:20 | DEBUG    | Matching for input is [1]
21:44:20 | DEBUG    | Has header True
21:44:20 | DEBUG    | Schema {'input': [1], 'key': None}
21:44:20 | DEBUG    | Standardizing input single
21:44:20 | DEBUG    | Writing standardized input to /tmp/ersilia-6x2vc0s_/standard_input_file.csv
21:44:20 | DEBUG    | Reading standard file from /tmp/ersilia-6x2vc0s_/standard_input_file.csv
21:44:21 | DEBUG    | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:44:29 | DEBUG    | Status code: 200
21:44:29 | DEBUG    | Schema available in /home/kellen/eos/dest/eos5505/api_schema.json
21:44:33 | DEBUG    | Status code: 200
21:44:38 | DEBUG    | Status code: 200
21:44:43 | DEBUG    | Status code: 200
21:44:48 | DEBUG    | Status code: 200
21:44:48 | DEBUG    | Done with unique posting
21:44:51 | DEBUG    | Data: outcome
21:44:51 | DEBUG    | Values: [0.049]
21:44:51 | DEBUG    | Datatype: numeric_array
output_2.csv

The predictions were output_2.csv

kellenkinya commented 8 months ago

Interpretations

Output of Rat Liver Microsomal Stability Predictions of the original model(screenshot above) using the original code match with Ersilia predictions( csv file above). Though the decimal point differ , the class which each molecule fall into( stable or unstable) is the same. same goes for Human Liver Cytosolic Stability model predictions.

kellenkinya commented 8 months ago

Week 2 Rundown

I was able to

Select a model and install it my local computer
Run predictions of the Essential Medical List provided
Download models as docker containers
Compare results with Ersilia models

kellenkinya commented 8 months ago

Week 3

Model 1:

Name : A Knowledge-Graph-Based Multimodal Deep Learning Framework for Identifying Drug–Drug Interactions

Publication : https://www.mdpi.com/1420-3049/28/3/1490

Source Code : https://github.com/zhangjing9965/KGCN_NFM/tree/main

Motivation behind model suggestion

Drug-Drug Interactions(DDIs) is a situation or a circumstance in which a patient takes two or more drugs simultaneously and one of administered drug alter the Pharmalogical or Clinical Responses of another drug. Such drug reactions may result to adverse negative side effects which sometimes may be life threatening or alter the drug effectiveness.

Description :

KGCN_NFM is a deep learning framework that combines the knowledge of graph convolutional networks (KGCNs) and Neural factorization machines (NFMs) to predict DDIs.

Schematic workflow of KGCN_NFM

Why is it relevant to ersilia

Ersilia mission to aid in drug discovery will lead to new drugs being introduced into the market. Carrying out predictions on how this drugs will interact with other drugs, or how existing drugs interact will prevent adverse side affects of some Drug-Drug Interactions.

How would you implement it

The codes are readily available and also the dataset.

Dependencies

python==3.6.13
ampligraph==1.3.2
deepctr==0.8.4
Keras==2.2.4
numpy==1.16.4
pandas==1.1.5
scikit_learn==0.24.2
tensorflow==1.13.1
tqdm==4.63.0
rdkit==2020.09.1.0

kellenkinya commented 8 months ago

Model 2:

Name : DeepGS: Deep Representation Learning of Graphs and Sequences for Drug-Target Binding Affinity Prediction

Publication : https://paperswithcode.com/paper/deepgs-deep-representation-learning-of-graphs

Source Code : https://github.com/XuanLin1991/DeepGS

Motivation behind model suggestion

Binding affinity indicates the strength of drug-target interactions. Successful identification or prediction of drug-target interactions(DTI) enables researchers to understand and predict how drugs interact with their target protein or biological molecules. This will help in new drug discovery or repurposing existing ones by targeting a specifics proteins.

Description :

DeepGS uses deep neural network to extract local chemical context from amino acids and SMILES sequences, as well as the molecular structure from the drugs and then predict the drug-target Binding affinity.

Schematic workflow of DeepGS

Why is it relevant to ersilia

DeepGS framework will aid in predicting how drugs interact with various targets, this will help aid ersilia in new drug discovery of neglected diseases by tailoring treatments based on individual patient genes and biological molecules. .

How would you implement it

Create a new environment

conda create -n deepgs python=3.7.6
source activate deepgs

Install Pytorch, RDKit and pytorch-geometric.

Git clone their repository

git clone https://github.com/jacklin18/DeepGS.git

move in the cloned repository and install the requirements
```
cd DeepGS  
pip install -r requirements.txt
```

Provide training data with each row containing a molecule(SMILES strings) , a protein sequence(amino acids) and a label between the drug-target pair(binding affinity value) example

CC1=C2C=C(C=CC2=NN1)C3=CC(=CN=C3)OCC(CC4=CC=CC=C4)N MKKFFDSRREQGGSGLGSGSSGGGGSTSGLGSGYIGRVFGIGRQQVTVDEVLAEGGFAIVFLVRTSNGMKCALKRMFVNNEHDLQVCKREIQIMRDLSGHKNIVGYIDSSINNVSSGDVWEVLILMDFCRGGQVVNLMNQRLQTGFTENEVLQIFCDTCEAVARLHQCKTPIIHRDLKVENILLHDRGHYVLCDFGSATNKFQNPQTEGVNAVEDEIKKYTTLSYRAPEMVNLYSGKIITTKADIWALGCLLYKLCYFTLPFGESQVAICDGNFTIPDNSRYSQDMHCLIRYMLEPDPDKRPDIYQVSYFSFKLLKKECPIPNVQNSPIPAKLPEPVKASEAAAKKTQPKARLTDPIPTTETSIAPRQRPKAGQTQPNPGILPIQPALTPRKRATVQPPPQAAGSSNQPGLLASVPQPKPQAPPSQPLPQTQAKQPQAPPTPQQTPSTQAQGLPAQAQATPQHQQQLFLKQQQQQQQPPPAQQQPAGTFYQQQQAQTQQFQAVHPATQKPAIAQFPVVSQGGSQQQLMQNFYQQQQQQQQQQQQQQLATALHQQQLMTQQAALQQKPTMAAGQQPQPQPAAAPQPAPAQEPAIQAPVRQQPKVQTTPPPAVQGQKVGSLTPPSSPKTQRAGHRRILSDVTHSAVFGVPASKSTQLLQAAAAEASLNKSKSATTTPSGSPRTSQQNVYNPSEGSTWNPFDDDNFSKLTAEELLNKDFAKLGEGKHPEKLGGSAESLIPGFQSTQGDAFATTSFSAGTAEKRKGGQTVDSGLPLLSVSDPFIPLQVPDAPEKLIEGLKSPDTSLLLPDLLPMTDPFGSTSDAVIEKADVAVESLIPGLEPPVPQRLPSQTESVTSNRTDSLTGEDSLLDCSLLSNPTTDLLEEFAPTAISAPVHKAAEDSNLISGFDVPEGSDKVAEDEFDPIPVLITKNPQGGHSRNSSGSSESSLPNLARSLLLVDQLIDL 43.0

Usage
- Preprocess data as input
```
cd code
sh/bash preprocess.sh
```
  -train the model
```
sh/bash run_tranining.sh 
```
  Dependencies
- Python 3.4 <=
- Keras 2.x
- Tensorflow 1.x
- numpy
- matplotlib
- scikit-learn

kellenkinya commented 8 months ago

Model 3 :

Name : Lipophilicity Prediction with Graph Convolutions and Molecular Substructures Representation

Publication :https://www.researchgate.net/publication/346302569_Lipophilicity_Prediction_with_Multitask_Learning_and_Molecular_Substructures_Representation

Source Code : https://github.com/VEK239/StructGNN-lipophilicity

Motivation behind model suggestion

After doing predictions on PAMPA pH 7.4 when working with NCATS_ADME I came across(read) on permeability of drugs across cell membranes. And as I was reading I found a similar model which predicts logP and logD descriptors.

Description :

StructGNN predicts logP and logD descriptors by "encoding additional graph information by extracting molecular substructures through adding a set of generalized atomic features of these substructures to an established Direct Message Passing Neural Network (D-MPNN)."

Schematic workflow of StructGNN

Why is it relevant to ersilia

StructGNN is an complementary additional model to the models available in ersilia hub for logP and logD prediction in drugs permeability against cell membranes which expands toolbox of models available for Lipophilicity Prediction

How would you implement it

Installations

git clone https://github.com//VEK239/StructGNN-lipophilicity.git
git checkout SOTA
cd scripts/SOTA/dmpnn
conda env create -f environment.yml
conda activate chemprop
pip install -e .

Model is trained manually or using a dvc pipeline
Jupyter notebook with EDA, data preprocessing and prediction analysis is given also with python scripts for model training
Data files are available

Dependencies

conda create -n mol_ot python=3.6.8
sudo apt-get install libxrender1
conda install pytorch torchvision -c pytorch
conda install -c rdkit rdkit
conda install -c conda-forge pot
conda install -c anaconda scikit-learn
conda install -c conda-forge matplotlib
conda install -c conda-forge tqdm
conda install -c conda-forge tensorboardx

GemmaTuron commented 8 months ago

Hello,

Thanks for your work during the Outreachy contribution period, we hope you enjoyed it! We will now close this issue while we work on the selection of interns. Thanks again!

ersilia-os / ersilia

✍️ Contribution period: <Kellen_Kinya> #867

Week 1 - Get to know the community

Week 2 - Install and run an ML model

Week 3 - Propose new models

Week 4 - Prepare your final application

Week 1 rundown

Week 2 tasks

Task 1: Select a model

NCATS Rat Liver Microsomal Stability

Why NCATS Rat Liver Microsomal Stability

Was running into a timeout error when trying to run the application on ADMW@NCATS model.

What I did

Was running into a timeout error when trying to run the application on ADMW@NCATS model.

Was running into a timeout error when trying to run the application on ADMW@NCATS model.

Running ADME@NCATS

Task 2: Installing the model

Steps I took

Task 3 Run Predictions

PAMPA pH 7.4 model prediction

Interpretations.

PAMPA pH 5 model prediction

Interpretations.

Rat Liver Microsomal Stability Predictions

Human Liver Cytosolic Stability Predictions

Interpretations

Task 4: Understanding Ersilia backend.

Task 5: Compare results with ersilia models.

Interpretations

Week 2 Rundown

Week 3

Model 1:

Name : A Knowledge-Graph-Based Multimodal Deep Learning Framework for Identifying Drug–Drug Interactions

Publication : https://www.mdpi.com/1420-3049/28/3/1490

Source Code : https://github.com/zhangjing9965/KGCN_NFM/tree/main

Motivation behind model suggestion

Description :

Schematic workflow of KGCN_NFM

Why is it relevant to ersilia

How would you implement it

Dependencies

Model 2:

Name : DeepGS: Deep Representation Learning of Graphs and Sequences for Drug-Target Binding Affinity Prediction

Publication : https://paperswithcode.com/paper/deepgs-deep-representation-learning-of-graphs

Source Code : https://github.com/XuanLin1991/DeepGS

Motivation behind model suggestion

Description :

Schematic workflow of DeepGS

Why is it relevant to ersilia

How would you implement it

Dependencies

Model 3 :

Name : Lipophilicity Prediction with Graph Convolutions and Molecular Substructures Representation

Publication :https://www.researchgate.net/publication/346302569_Lipophilicity_Prediction_with_Multitask_Learning_and_Molecular_Substructures_Representation

Source Code : https://github.com/VEK239/StructGNN-lipophilicity

Motivation behind model suggestion

Description :

Schematic workflow of StructGNN

Why is it relevant to ersilia

How would you implement it

Dependencies