ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
210 stars 143 forks source link

✍️ Contribution period: Sarima Chiorlu #636

Closed Sarichii closed 1 year ago

Sarichii commented 1 year ago

Week 1 - Get to know the community

Week 2 - Install and run an ML model

Week 3 - Propose new models

Week 4 - Prepare your final application

Sarichii commented 1 year ago

My motivation

I am a final year student of systems engineering at the University of Lagos in Nigeria. I have been interested in biomedical engineering research. I have taken a keen interest in AI research for the medical sciences. In order to achieve this goal, I have taken courses and self-learned after school hours in machine learning, deep learning, and how to apply it in real-world scenarios. I have taken part in Kaggle and zindi competitions in order to fine-tune my skills and contribute to the AI community. I always knew I wanted to have a career in AI research but didn't know what industry to apply it in. However, after my friends' loved ones and people, I knew personally lost their lives or lost essential body parts due to misdiagnosis in medicine. I decided to venture into biomedical AI research. In developing countries, time and resources are not given to medicine, hence, we see that progress in disease identification that is unique to these regions is considerably low. We know that disease research in countries is done uniquely to these countries and we see a big gap in disease identification and the state of the healthcare industries in these places (for example, china does research in diseases that are unique to their country and so on), leading to loss of lives and grief for loved ones who lost their lives to this debilitating situation. I have volunteered for community impacts in slums in Africa and I can say that an average child in some regions of Africa do not have the opportunity to enjoy their childhood due to diseases that most of them are unaware are medically related but however attributed it to the devices of relatives. When my application to Outreachy was accepted, I was excited and couldn't wait to contribute to the research projects available. When I went through the projects and came across Ersilia, I saw a place that had all that I had been looking for, I saw a place that was committed to disease research and knew I wanted to be part of your vision and make my little impact in what you are doing here. I know I can gain real-time experience in applying my skills and committing to a project I am passionate about. I can't wait to see how I can do my little part in developing, building, and testing models that contribute to fostering disease identification in underprivileged countries and regions in the world. Looking forward to it!

Sarichii commented 1 year ago

I was able to successfully run the model on my system. I made use of ubuntu20.04

(ersilia) sarima@Richio:~/ersilia$ ersilia --help
Usage: ersilia [OPTIONS] COMMAND [ARGS]...

  Ersilia CLI

Options:
  --version      Show the version and exit.
  -v, --verbose  Show logging on terminal when running commands.
  -s, --silent   Do not echo any progress message.
  --help         Show this message and exit.

Commands:
  api      Run API on a served model
  auth     Log in to ersilia to enter contributor mode.
  card     Get model info card
  catalog  List a catalog of models
  clear    Clear ersilia
  close    Close model
  delete   Delete model from local computer
  example  Generate input examples for the model of interest
  fetch    Fetch model from Ersilia Model Hub
  info     Get model information
  sample   Sample inputs and model identifiers
  serve    Serve model
  test     Test a model
GemmaTuron commented 1 year ago

Hi @Sarichii

Welcome to the contribution period!

Sarichii commented 1 year ago

Hi @GemmaTuron Thankyou!

Sarichii commented 1 year ago

Week 2

Task 1 - Select a model from the suggested list

Brief description of model Model Name: ncats-adme

ADME@NCATS is a resource developed by NCATS to host in silico prediction models for various ADME (Absorption, Distribution, Metabolism and Excretion) properties. The resource serves as an important tool for the drug discovery community with potential uses in compound optimization and prioritization. The models were retrospectively validated on a subset of marketed drugs which resulted in very good accuracies.

The input can be in the form of a CSV or text file containing SMILES. Alternatively, the input can be a molecule sketched using the molecule editor provided. For each compound, the predictions from the models are provided as output along with the confidence scores.

To learn more about the model, check it out here

Why did I choose this model? Problems with drugs are responsible for many clinical failures. However, by understanding the properties of these drugs, we are a step closer to contributing to the healthcare and reduce risks encountered when tackling diseases. I chose this model because it is closely related to what I would love to research on and contribute to. We see that most at times, the difference between a life saved and a life lost can be the addition of an extra carbon molecule to the make-up of a drug.

Sarichii commented 1 year ago

Task 2: Install the model on your system

I followed the instructions as shown on the ncats-adme official repo System used: Ubuntu 20.08 Step 1: I cloned the repository to my local machine using the command git clone --recursive https://github.com/ncats/ncats-adme.git. Then I navigated into the ncats-adme directory using cd ncats-adme and further navigated into the server directory usingcd server.

Step 2: I proceeded to creating my environment using the command conda env create --prefix ./env -f environment.yml This took a lot of time to load up but it was done. However, at the end of the day, I got an error as shown below:

(base) sarima@Richio:~/ncats-adme/server$ conda env create --prefix ./env -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: \ Killed

I spent some time trying to figure out what was wrong and I had first suspected it was because it was not able to locate the path to the env file. However, I decided to start the process again but this time, using the development branch. I was able to access just the development using the command: git clone --branch development --recursive https://github.com/ncats/ncats-adme.git and once again navigated into the project folder and further into the server directory.

Step 3: I proceeded to creating my environment and this time around, it worked! However, I encountered an error as shown below:

(base) sarima@Richio:~/dev1/ncats-adme/server$ conda env create --prefix ./env -f environment.yml Retrieving notices: ...working... done Collecting package metadata (repodata.json): done Solving environment: done

Downloading and Extracting Packages

Preparing transaction: done Verifying transaction: done Executing transaction: done Installing pip dependencies: / Ran pip subprocess with arguments: ['/home/sarima/dev1/ncats-adme/server/env/bin/python', '-m', 'pip', 'install', '-U', '-r', '/home/sarima/dev1/ncats-adme/server/condaenv.k2zq9oc7.requirements.txt', '--exists-action=b'] Pip subprocess output: Collecting keras-self-attention==0.41.0 Downloading keras-self-attention-0.41.0.tar.gz (9.3 kB) Collecting tensorflow==2.2.0 Downloading tensorflow-2.2.0-cp38-cp38-manylinux2010_x86_64.whl (516.3 MB)

Pip subprocess error: ERROR: Exception: Traceback (most recent call last): File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 425, in _error_catcher yield File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 507, in read data = self._fp.read(amt) if not fp_closed else b"" File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/cachecontrol/filewrapper.py", line 62, in read data = self.__fp.read(amt) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/http/client.py", line 459, in read n = self.readinto(b) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/http/client.py", line 503, in readinto n = self.fp.readinto(b) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/socket.py", line 669, in readinto return self._sock.recv_into(b) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/ssl.py", line 1241, in recv_into return self.read(nbytes, buffer) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/ssl.py", line 1099, in read return self._sslobj.read(len, buffer) socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 186, in _main status = self.run(options, args) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 331, in run resolver.resolve(requirement_set) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_internal/legacy_resolve.py", line 177, in resolve discovered_reqs.extend(self._resolve_one(requirement_set, req)) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_internal/legacy_resolve.py", line 333, in _resolve_one abstract_dist = self._get_abstract_dist_for(req_to_install) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_internal/legacy_resolve.py", line 282, in _get_abstract_dist_for abstract_dist = self.preparer.prepare_linked_requirement(req) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 480, in prepare_linked_requirement local_path = unpack_url( File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 282, in unpack_url return unpack_http_url( File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 158, in unpack_http_url from_path, content_type = _download_http_url( File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_internal/operations/prepare.py", line 303, in _download_http_url for chunk in download.chunks: File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_internal/utils/ui.py", line 160, in iter for x in it: File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_internal/network/utils.py", line 15, in response_chunks for chunk in response.raw.stream( File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 564, in stream data = self.read(amt=amt, decode_content=decode_content) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 529, in read raise IncompleteRead(self._fp_bytes_read, self.length_remaining) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/contextlib.py", line 131, in exit self.gen.throw(type, value, traceback) File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/pip/_vendor/urllib3/response.py", line 430, in _error_catcher raise ReadTimeoutError(self._pool, None, "Read timed out.") pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

failed

CondaEnvException: Pip failed

I hence decided to install the pip dependencies in the model environment one at a time. Hence, I first activated the env using the command conda activate ./env and installed the modules individually. It looked something like this: pip install keras-self-attention==0.41.0 pip install tensorflow==2.2.0 pip install typed-argument-parser==1.5.4 pip install gunicorn==20.0.4 pip install flask_swagger_ui==4.11.1 pip install py-healthcheck==1.10.1 Once I had all my modules installed, I proceeded to the next step

Step 4: I ran python app.py. However, I encountered an error as shown below:

(/home/sarima/dev1/ncats-adme/server/env) sarima@Richio:~/dev1/ncats-adme/server$ python app.py Traceback (most recent call last): File "app.py", line 1, in import flask File "/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/flask/init.py", line 14, in from jinja2 import escape ImportError: cannot import name 'escape' from 'jinja2' (/home/sarima/dev1/ncats-adme/server/env/lib/python3.8/site-packages/jinja2/init.py)

I had earlier seen an issue raised that was similar to this and went saw it was solved by simply uninstalling flask and installing it back again which I tried out and that sold the problem. So I ran pip uninstall flask to uninstall flask and later on ran pip install flask to install it again. I felt this worked because the flask installed was a newer version. Not sure about this :). I reran the command python app.py and it worked! I got an output as shown below:

Loading RLM graph convolutional neural network model
gcnn_model.pt: 100%|████████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 543MB/s]
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading RLM model files
Loading PAMPA graph convolutional neural network model
Model File Does not Exist. Downloading!
gcnn_model.pt: 100%|███████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 1.05GB/s]
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading PAMPA 7.4 models
Loading PAMPA graph convolutional neural network model
gcnn_model.pt: 100%|████████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 333MB/s]
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading PAMPA 5.0 models
Loading Solubility graph convolutional neural network model
Model File Does not Exist. Downloading!
gcnn_model.pt: 100%|████████████████████████████████████████████████████████████████| 1.36M/1.36M [00:00<00:00, 278MB/s]
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading Solubility models
Loading human liver cytosol stability random forest models
model_1: 100%|██████████████████████████████████████████████████████████████████████| 2.21M/2.21M [00:00<00:00, 717MB/s]
model_2: 100%|██████████████████████████████████████████████████████████████████████| 2.26M/2.26M [00:00<00:00, 866MB/s]
model_3: 100%|█████████████████████████████████████████████████████████████████████| 2.36M/2.36M [00:00<00:00, 1.18GB/s]
100%|█████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:00<00:00, 20.18s/it]
Finished loading human liver cytosol stability models
Loading CYP450 random forest models
cyp2c9_inhib-model_0: 100%|█████████████████████████████████████████████████████████| 8.73M/8.73M [00:00<00:00, 302MB/s]
cyp2c9_inhib-model_1: 100%|█████████████████████████████████████████████████████████| 8.39M/8.39M [00:00<00:00, 494MB/s]
cyp2c9_inhib-model_2: 100%|█████████████████████████████████████████████████████████| 8.46M/8.46M [00:00<00:00, 187MB/s]
cyp2c9_inhib-model_3: 100%|█████████████████████████████████████████████████████████| 8.40M/8.40M [00:00<00:00, 619MB/s]
cyp2c9_inhib-model_4: 100%|█████████████████████████████████████████████████████████| 8.75M/8.75M [00:00<00:00, 669MB/s]
cyp2c9_inhib-model_5: 100%|█████████████████████████████████████████████████████████| 8.49M/8.49M [00:00<00:00, 891MB/s]
cyp2c9_inhib-model_6: 100%|█████████████████████████████████████████████████████████| 8.52M/8.52M [00:00<00:00, 864MB/s]
cyp2c9_inhib-model_7: 100%|█████████████████████████████████████████████████████████| 8.62M/8.62M [00:00<00:00, 397MB/s]
Sarichii commented 1 year ago

Task 3: Run predictions for the EML

GemmaTuron commented 1 year ago

Hi @Sarichii

We (outreachy mentors) haven't seen much activity in the last days. Just a kind reminder that there are ~10 days left to complete the Outreachy contribution period. If you are still interested in applying, we encourage you to continue the work and report it here. We are also available on the Slack channel for further discussions. If you have decided not to apply to the internship with Ersilia, please close this issue to facilitate tracking of the active contributors to the mentors. Many thanks!

Sarichii commented 1 year ago
Finished loading CYP450 model files
 * Serving Flask app 'app'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://192.168.0.103:5000
Press CTRL+C to quit

Run predictions for the EML I ran the app.py and used the pre-trained RLM graph convolutional neural network model to do predictions for the EML. It contained 442 columns of drugs with their smiles and can smiles. Parallel Artificial Membrane Permeability(PAMPA) is an in vitro surrogate to determine the permeability of drugs across cellular membranes. PAMPA5.0 with the number referring to the pH value in which they were tested. that is, at a pH value of 5.0. Drugs with log peff (permeability) values lower than 2.0 were considered to have low-moderate permeability while values greater than 2.5 were considered to have high permeability and values between 2.0 to 2.5 were omitted as they were difficult to check the permeability of the drug. Here are the predictions in CSV format: ADME_Predictions_2023-03-23-200520

Steps taken to run the predictions Step 1: Once my python app.py server was running locally on my remote server. I navigated to the directory in ersilia-os repo containing the Essentials Medicine List which can be gotten here and downloaded it to my local machine. Step 2: I choose the PAMPA5.0 model as what I wanted to run predictions with and went further to upload my text file. During the setup of my text file, I indicated that I had headers and also choose the column number for my SMILES column as 1. This was because the drug column was at an index of 0. Step 3: I ran my predictions and got the results which can be found in this csv file

Task 4: Compare results with the Ersilia Model Hub implementation!

My first approach to this problem was navigating to ersilia model hub and lookimg for the model I had decided to predict and run predictions for my first SMILE data but thinking it through, it was not in any way efficient. I went through the issues and got ideas on how to walk around this problem! Step 1: I cloned the repo to my local machine and located the main python program for the model. I ran the code but was thrown errors mainly because of uninstalled modules at first :). I proceeded to install the required modules. Step 2: Then I went through the repo to have a basic understanding of what was going on. I saw that for my input and output file, there were no pointers to their locations. So I had to make some edits to those files and replaced them with the directory of the eml csv file and the prediction files respectively. We would also need to change our smiles_list = [r[0] for r in reader] from 0 to 1, that is it becomes: smiles_list = [r[1] for r in reader] as our SMILES column is at position 1. Then I reran my main.py file and got the output as shown below:

Loading PAMPA graph convolutional neural network model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading PAMPA 5.0 models
Loading PAMPA graph convolutional neural network model
Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
Finished loading PAMPA 5.0 models
<rdkit.Chem.rdchem.Mol object at 0x00000261E9229850>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9229A10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92297E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9229930>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9229A80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9229B60>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9229AF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9229BD0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91936F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91921F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91937D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9193920>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9193840>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9193990>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9193680>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91938B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC040>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC120>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC190>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC200>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC270>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC2E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC350>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC3C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC430>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC4A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC510>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC580>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC5F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC660>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC6D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC740>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC7B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC820>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC890>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC900>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC970>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CC9E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CCA50>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CCAC0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CCB30>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CCBA0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CCC10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CCC80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CCCF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CCD60>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CCE40>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CCEB0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CCF90>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD000>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD0E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD150>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD1C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD2A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD310>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD3F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD460>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD4D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD5B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD620>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD690>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD700>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD770>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD7E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD850>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD930>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CD9A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDA10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDA80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDAF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDB60>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDC40>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDCB0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDD90>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDE00>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDE70>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDEE0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDF50>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CDFC0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE030>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE0A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE180>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE1F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE2D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE340>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE3B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE420>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE500>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE570>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE5E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE650>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE730>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE7A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE810>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE880>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE8F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE960>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CE9D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CEAB0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CEB20>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CEB90>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CEC70>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CECE0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CED50>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CEE30>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CEEA0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CEF10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CEF80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CEFF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF0D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF140>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF1B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF220>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF290>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF300>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF370>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF3E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF4C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF530>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF610>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF6F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF760>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF7D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF840>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF8B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF920>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CF990>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFA00>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFA70>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFAE0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFB50>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFBC0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFCA0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFD10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFD80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFDF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFE60>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFED0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91CFF40>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8040>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D80B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8120>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8190>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8270>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D82E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8350>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D83C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8430>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D84A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8510>
[22:19:58] WARNING: not removing hydrogen atom without neighbors
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8580>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8660>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D86D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D87B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8820>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8890>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8900>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D89E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8A50>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8AC0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8B30>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8BA0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8C10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8C80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8CF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8D60>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8DD0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8EB0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8F20>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D8F90>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9070>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D90E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9150>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D91C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D92A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9310>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9380>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D93F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9460>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9540>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D95B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9620>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9690>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9700>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D97E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D98C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D99A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9A10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9A80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9AF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9B60>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9C40>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9CB0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9D20>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9E00>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9EE0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91D9FC0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA030>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA0A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA110>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA180>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA1F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA2D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA340>
[22:19:58] WARNING: not removing hydrogen atom without neighbors
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA3B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA420>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA490>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA500>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA570>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA5E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA6C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA7A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA880>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA8F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA960>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DA9D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DAAB0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DAB20>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DAB90>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DAC00>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DAC70>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DAD50>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DADC0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DAE30>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DAF10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DAF80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DAFF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB060>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB0D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB140>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB1B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB220>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB290>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB300>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB370>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB3E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB4C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB5A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB680>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB6F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB760>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB7D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB840>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB8B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DB990>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DBA00>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DBAE0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DBB50>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DBBC0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DBC30>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DBCA0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DBD10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DBD80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DBDF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E91DBED0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248040>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92480B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248120>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248190>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248200>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248270>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92482E0>
[22:19:58] WARNING: not removing hydrogen atom without neighbors
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248350>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92483C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248430>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92484A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248510>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92485F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248660>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92486D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248740>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92487B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248820>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248890>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248970>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92489E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248A50>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248B30>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248BA0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248C80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248CF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248DD0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248E40>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248EB0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9248F90>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249000>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249070>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92490E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249150>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249230>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92492A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249310>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92493F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249460>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92494D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92495B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249620>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249690>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249770>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92497E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249850>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92498C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249930>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92499A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249A80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249AF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249B60>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249C40>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249CB0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249D20>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249D90>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249E00>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249E70>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9249F50>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A030>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A0A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A110>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A180>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A260>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A2D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A340>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A420>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A490>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A570>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A5E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A6C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A730>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A7A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A810>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A880>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A8F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A960>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924A9D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AA40>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AAB0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AB20>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AB90>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AC00>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AC70>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924ACE0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AD50>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924ADC0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AE30>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AEA0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AF10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AF80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924AFF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B0D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B1B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B220>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B290>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B300>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B370>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B3E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B450>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B4C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B530>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B5A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B680>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B6F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B760>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B7D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B8B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924B990>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924BA00>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924BA70>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924BAE0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924BBC0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924BC30>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924BD10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924BD80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924BDF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E924BE60>
[22:19:58] WARNING: not removing hydrogen atom without neighbors
<rdkit.Chem.rdchem.Mol object at 0x00000261E924BED0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264040>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92640B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264120>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264190>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264270>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92642E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264350>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92643C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264430>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92644A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264510>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264580>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92645F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264660>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92646D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264740>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92647B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264820>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264890>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264970>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92649E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264A50>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264AC0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264B30>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264C10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264CF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264D60>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264DD0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264E40>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264EB0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264F20>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9264F90>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265070>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92650E0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265150>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92651C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265230>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92652A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265310>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265380>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265460>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92654D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265540>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92655B0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265620>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265690>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265700>
[22:19:58] WARNING: not removing hydrogen atom without neighbors
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265770>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265850>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92658C0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92659A0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265A10>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265A80>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265AF0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265BD0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265C40>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265CB0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265D20>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265D90>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265E00>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265EE0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9265FC0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9266030>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9266110>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9266180>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92661F0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E92662D0>
<rdkit.Chem.rdchem.Mol object at 0x00000261E9266340>
[22:19:59] WARNING: not removing hydrogen atom without neighbors
[22:19:59] WARNING: not removing hydrogen atom without neighbors
[22:19:59] WARNING: not removing hydrogen atom without neighbors
[22:19:59] WARNING: not removing hydrogen atom without neighbors
[22:19:59] WARNING: not removing hydrogen atom without neighbors
100%|██████████████████████████████████████████████████| 442/442 [00:04<00:00, 100.64it/s]
PAMPA 5.0: 4.697077512741089 seconds to predict 442 molecules
                                                smiles  ...                     Prediction
0        Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1  ...               low permeability
1    C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(...  ...  moderate or high permeability
2                         CC(=O)Nc1sc(nn1)[S](N)(=O)=O  ...               low permeability
3                                              CC(O)=O  ...               low permeability
4                              CC(=O)N[C@@H](CS)C(O)=O  ...               low permeability
..                                                 ...  ...                            ...
437             CC(=O)CC(c1ccccc1)C2=C(O)Oc3ccccc3C2=O  ...  moderate or high permeability
438                    Cc1cc(cc(C)c1CC2=NCCN2)C(C)(C)C  ...  moderate or high permeability
439  CC1=CN([C@H]2C[C@H](N=[N+]=[N-])[C@@H](CO)O2)C...  ...               low permeability
440                         [Zn++].[O-][S]([O-])(=O)=O  ...  moderate or high permeability
441             O.OC(Cn1ccnc1)([P](O)(O)=O)[P](O)(O)=O  ...               low permeability

[442 rows x 3 columns]

After running both models, I saw that the outputs are the same for the ncats-adme model (utilizing PAMPA graph convolutional neural network) and the ersilia model were the same It has the same predictions as the predictions of PAMPA graph convolutional neural network model by NCATS with 152 low permeability compounds and 292 moderate or high-permeability compounds.

Reason for choosing NCATS PAMPA5.0

Sarichii commented 1 year ago

Week 3: Propose new models

Model 1:

Model Name GraphDTA: predicting drug–target binding affinity with graph neural networks

Model Description This model uses graph neural network and the conventional convolutional neural network to check how well a drug binds to a given protein. The model takes in two inputs, that is the drug and the protein. For the proteins, we use a string of ASCII characters and apply several 1D CNN layers over the text to learn a sequence representation vector. Specifically, the protein sequence is first categorically encoded, then an embedding layer is added to the sequence where each (encoded) character is represented by a 128-dimensional vector. For the drugs, they are represented as molecular graphs. The drugs and proteins are passed over several pools of Deep Learning model to finally obtain the affinity binding of the drug and the corresponding protein.

Model Summary The development of new drugs is costly, time-consuming and often accompanied with safety issues. Drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. In order to repurpose drugs effectively, it is useful to know which proteins are targeted by which drugs. Computational models that estimate the interaction strength of new drug–target pairs have the potential to expedite drug repurposing. Several models have been proposed for this task. However, these models represent the drugs as strings, which is not a natural way to represent molecules. We propose a new model called GraphDTA that represents drugs as graphs and uses graph neural networks to predict drug–target affinity. We show that graph neural networks not only predict drug–target affinity better than non-deep learning models, but also outperform competing deep learning methods. The results gotten confirm that deep learning models are appropriate for drug–target binding affinity prediction, and that representing drugs as graphs can lead to further improvements.

Relevance to Ersilia: Drug–target affinity (DTA) prediction is an important step in virtual screening (Using computers to run a quick search of large libraries of chemical structures in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme) which can quickly match target and drug and speed up the process of drug development. This makes drug development easier as you know what drug to use with a particular protein. DTA prediction provides information about the binding strength of drugs to target proteins, which can be used to show whether small molecules can bind to proteins. This comes in handy in the search of new drugs which is one of Ersilia's mission

Slug: Drug-target-affinity

Input: Compound Input Shape: Molecular Graph Output: Probability Output Type: Float Output Shape: Single Interpretation: Checking how well drugs and proteins bind together. Binding affinity is measured in terms of the dissociation constant. The dissociation constant is a measure of the ratio of unbound (dissociated) ligands to bound ligands at equilibrium. The lower the value of the dissociation constant, the stronger the binding affinity.

Tag Drug discovery, convolutional neural networks

Language: Python3.8.0

Package Dependencies: Python3.10 Pytorch

References Publication Source Code

License This model is a free software and is not currently licensed at the time of writing this model proposal

GemmaTuron commented 1 year ago

Hi @Sarichii

Let me give feedback point by point:

  1. Good job on the PAMPA model, for the EMH as explained in the instructions the input files must be modified so that only a .csv with a single column (for single input) is passed
  2. For the model suggestions, please revise and edit your comments since I the last two comments has a mix of different models?
    • AMPL: AMPL is a pipeline to be used to train models, not a model directly so it cannot be incorporated in the EMH as is
    • DeepPurporse: good catch, this is actually already on our database to be incorporated, thanks Please revise the other model suggestions and make sure the links are correct
Sarichii commented 1 year ago

@GemmaTuron Thanks for the feedback, currently working on the comments for the models

Sarichii commented 1 year ago

Model 2:

Model Name MolTrans: Molecular Interaction Transformer for drug–target interaction prediction

Model Summary Drug target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Over the past years, we have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (1) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain; (2) existing methods focus on limited labeled data while ignoring the value of massive unlabelled molecular data. We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (1) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction; (2) an augmented transformer encoder to better extract and capture the semantic relations among substructures extracted from massive unlabeled biomedical data.

Slug: Drug-target-Interaction

Input: Compound Input Shape: Single Task: Classification Output: Probability Output Type: Float Output Shape: Single

Tag Drug-target interaction

Language: Python3.8.0

Package Dependencies: numpy pandas tqdm scikit-learn torch subword-nmt

References Publication Source Code

License This model is licensed under the BSD-3-Clause license

Sarichii commented 1 year ago

Model 1:

Model Name DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences

Model Description This model using a convolutional neural network on raw protein sequences to predict the local residue patterns of protein participating in DTIs. It performs convolution on various lengths of amino acis subsequences to capture local residue patterns of generalized protein classes, in this study, we propose a deep learning based DTI prediction model capturing local residue patterns of proteins participating in DTIs.

Model Summary Drugs work by interacting with target proteins to activate or inhibit a target’s biological process. Therefore, identification of DTIs is a crucial step in drug discovery. However, identifying drug candidates via biological assays is very time and cost consuming, which introduces the need for a computational prediction approach for the identification of DTIs. In this model, we constructed a novel DTI prediction model to extract local residue patterns of target protein sequences using a CNN-based deep learning approach. As a result, the detected local features of protein sequences perform better than other protein descriptors for DTI prediction and previous models for predicting PubChem independent test datasets. That is, using the model, we successfully captured local residue patterns with CNN and enriches protein sequences from a raw sequence.

Slug: Drug-target-interaction

Input: Compound Input Shape: Single Task: Classification Output: Probability Output Type: Float Output Shape: Single Interpretation: We get results of 0 and 1 where 0 is negative and 1 is positive(Negative == No residue, positive == contains residue)

Tag Drug-target interaction

Language: Python3.8.0

Data Availability

Package Dependencies: tensorflow > 1.0 and < 2.0 keras > 2.0 numpy pandas scikit-learn

References Publication Source Code

If you wish to cite this model, please refer to the author [Lee I, Keum J, Nam H (2019) DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Comput Biol 15(6): e1007129. https://doi.org/10.1371/journal.pcbi.1007129]

License This model is licensed under GPL-3.0 license

Sarichii commented 1 year ago

Hi @GemmaTuron , I decided to get a new set of models. They've been updated. Waiting for your feedback and would love some extra tasks to work on before Monday.

GemmaTuron commented 1 year ago

Hi @Sarichii

Thanks for these models, they are already in our list!

Sarichii commented 1 year ago

Model 1:

Model Name SAMPN: A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility

Model Description It is highly desirable for rational compound design in the chemical and pharmaceutical industries to be able to anticipate molecular properties like lipophilicity and solubility efficiently and accurately. In order to investigate the relationship between chemical properties and structures in a comprehensible manner, SAMPN is built using the graph-neural-network (GAN). SAMPN's primary benefits include breaking the black-box mold of many machine/deep learning methods and directly using chemical graphs. Its attention mechanism specifically shows how much each atom in the molecule adds to the desired property, and the results are simple to visualize. Additionally, compared to other models that predict a single chemical property, Multi-SAMPN is a different formulation of SAMPN that can concurrently predict multiple chemical properties with better accuracy and efficiency. Moreover, SAMPN can generate chemically visible and interpretable results, which can help researchers discover new pharmaceuticals and materials.

Slug: Drug-discovery

Input: Compound Input Shape: Single Task: Classification Output: Probability Output Type: Float Output Shape: Matrix Interpretation: We get results of 0 and 1 where 0 is negative and 1 is positive(Negative == No residue, positive == contains residue)

Tag Aqueous solubility, Lipophilicity

Language: Python3.6.5

Package Dependencies: Python 3.6.5 Pytorch 1.0 RDkit 2018.03.4 Autograd 1.2 Numpy 1.14.2 Pandas 0.23.4 tqdm 3.7.1

References Publication Source Code

License This has no license

Sarichii commented 1 year ago

Model 2:

Model Name DeepFrag: a deep convolutional neural network for fragment-based lead optimization

Model Description In recent years, the area of computer-aided drug discovery has increasingly used machine learning, which has resulted in significant advancements in binding-affinity prediction, virtual screening, and QSAR. Surprisingly, lead optimization—the process of finding molecular slivers that could be added to a known ligand to increase its binding affinity—uses it less frequently. Here, we present a deep convolutional neural network that, given the structure of a receptor/ligand complex, predicts the right fragments. The DeepFrag model chose the known (correct) fragment from a collection of over 6500 ligands in an independent benchmark of known ligands with missing (deleted) fragments about 58% of the time. Even in cases where the known/correct fragment was not chosen, the top fragment was frequently chemically comparable and might be a reliable replacement.

Slug: Chemistry

Input: Compound Input Shape: Single Task: Classification Output: Probability Output Type: Float Output Shape: Single

Language: Python3.8.0

Package Dependencies: Python3.10

References Publication Source Code

License Apache License, Version 2.0.

Sarichii commented 1 year ago

Model 3:

Model Name MolGAN: An implicit generative model for small molecular graphs

Model Description A fresh perspective on the issue of chemical synthesis is provided by deep generative models for graph-structured data. By optimizing differentiable models that produce molecular graphs directly, it is possible to avoid costly search methods in the discrete and vast space of chemical structures. MolGAN is an implicit, likelihood-free generative model for small molecular graphs that avoids the need for pricey graph matching techniques or node ordering algorithms used in earlier likelihood-based approaches. Generative adversarial networks (GANs) are modified in our approach to function directly on graph-structured data. In order to promote the creation of molecules with particular desired chemical properties, we combine our method with a reinforcement learning goal. In tests using the QM9 chemical database, we show that our algorithm can produce nearly 100% valid compounds.

Slug: Chemistry

Input: Compound Input Shape: Single Task: Classification Output: Probability Output Type: Float Output Shape: Single

Language: Python3.6.0

Data Availability

Package Dependencies: python>=3.6 tensorflow>=1.7.0: https://tensorflow.org/ rdkit: https://www.rdkit.org/ numpy scikit-learn

References Publication Source Code

License MIT License

GemmaTuron commented 1 year ago

Hi @Sarichii !

Good job on the model search! Please delete the repeated comments so that it is easier to follow the conversation, and lets focus on the final application this week.

Sarichii commented 1 year ago

Hi @GemmaTuron , the comments have been deleted. Right on to it!

Sarichii commented 1 year ago

Thanks for the feedback!