ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
189 stars 123 forks source link

✍️ Contribution period: Girisha Sahdev #622

Closed girishatechie closed 1 year ago

girishatechie commented 1 year ago

Week 1 - Get to know the community

Week 2 - Install and run an ML model

Week 3 - Propose new models

Week 4 - Prepare your final application

girishatechie commented 1 year ago

My Motivation to work at Ersilia: Introducing myself, I am an engineering sophomore from a Reputed Technical University - IGDTUW, in India. I am a Web Developer and an ML Enthusiast. I am versed with HTML, CSS, Javascript, C++, Unity, MERN and AI-ML. Going through all the organisations listed on the Outreachy Website, when I came across Ersilia, I immediately knew that I wanted to work here! I read more about the Ersilia Open Source Initiative, the fact that AI and ML is used for drug discovery against infectious and neglected diseases, which is the need of the hour at this point, clicked me instantly. Being an ML enthusiast since my freshman year, if I get an opportunity to implement ML for an amazing cause towards creating an impact in the Medical and Scientific Industry, I would definitely cherish this forever! Reading the Project Description and researching on this cause further, instantly left me looking forward to contribute and enhance the research capacity for such diseases, on a global level. I have an experience with Python, ML, Data Visualisation using Matplotlib and training ML Models using most accurate algorithms. I am a Harvard WECode Scholar and also the Vice-President of the AI Club, at our University. In my freshman year, I did a Research Internship on 'Applications of ML in Cybersecurity' and I had published a project report, regarding the same. For that Project, I mainly worked with Numpy, Matplotlib, scikit-learn, pandas and pickle as modules and libraries, this internship definitely gave me a good head-start for my ML Journey. I further started researching on how we can use ML for a good cause and this seems to be the perfect opportunity, for me to bring these ideas into fruition, along with everyone at the Ersilia Community. Making ML accessible on a widespread stage, all over the world, has always been on my wish-list, and I would be forever grateful to get an opportunity to help the Medical Industry with this, which is certainly one of the best use-cases of ML and AI. I even have great interest in Researching on Molecular Biology, Biomedical stuff and going through Ersilia's GitBook, I came across the Event Fund Workshop Section on ML for Drug Discovery, which caught my attention, the most. I read about how Chemistry Datasets are used here and ML Models are trained for them, this genuinely fascinated me and sparked my interest in the way things work here, at Ersilia! I completely believe in Ersilia's Open Source Initiative, and collaborating with others in this community, using AI-ML for good, would definitely enhance my experience as well as skills. Apart from this, my interaction with the Ersilia Community on the Public Discussion Channel, was very welcoming and motivating, working in such a healthy and warm environment will definitely be a great learning curve and will bring out the best, in all of us indeed!

Info about my Week 1 Tasks:

For the Week 1 Tasks, I've installed all the pre-requisites including Github CLI, Git LFS, Isaura Data Lake, etc. I've successfully installed Ersilia using Conda Environment as well as fetched the simplest model, which is eos3b5e using Ersilia CLI and calculated the molecular weight.

Fetching eos3b5e done in time: 0:01:48.019464s 04:29:54 | INFO | Fetching eos3b5e done successfully: 0:01:48.019464 👍 Model eos3b5e fetched successfully!

I've performed operations on it, using Ersilia CLI. I encountered a segmentation fault while fetching the model on my Mac M1, but I was able to resolve it by referring to GitHub Issues #591 and #610 . I included the log files (for improving error readability) while fetching the model and fetched it in verbose mode, and I was able to successfully retrieve information from it (command: ersilia card eos3b5e) as well as serve and test it. I performed the Calculation Operation on this simplest model, and I was able to calculate the molecular weight successfully, using the calculate API.

I could get all the models displayed on my screen, referring to the following command: ersilia catalog

How I fetched the model:

ersilia -v fetch eos3b5e > testingmodel.log 2>&1

GemmaTuron commented 1 year ago

Hi @girishatechie

It's great to have you here. As you will see in issue #368 this model seems to present some issues at prediction time. Could you please test it both using the CLI and the Google Colab template (use the template provided in /notebooks), report if it is working in either of the systems and the log files. When fetching the model, please collect the log files and try to identify the source of the error, if there are any.

Thanks!

girishatechie commented 1 year ago

Hi @GemmaTuron Ma'am! Thank you so much! Sure, I am on it! 👍🏻

girishatechie commented 1 year ago

test.log Update: I am not able to fetch the model eos9be7 through CLI, it shows the following Connection Error, despite disabling the firewall, trying multiple times as well as switching to a different internet connection:

🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) If this error message is not helpful, open an issue at:

If you haven't, try to run your command in verbose mode (-v in the CLI)

I've even tried running this in verbose mode. I am working on a MacOS with M1 Apple Chip. Navigating through the log file to find possible errors! According to me, there might be a possible network connection error. I am on it and trying to resolve it!

I have attached the Log File, above, in this Comment. @GemmaTuron @DhanshreeA

Update: I went through the log file, and I finally identified the source of error. I am getting the following warning message:

WARNING - Python 3.10.9 found in current environment is not officially supported by BentoML. The docker base image used is'bentoml/model-server:0.11.0' which will use conda to install Python 3.10.9 in the build process. Supported Python versions are: f3.6, 3.7, 3.8

My system has Python 3.10.9 installed, and it seems to be incompatible with this model, As mentioned in the Log File:

UnsatisfiableError: The following specifications were found to be incompatible with the existing python installation in your environment:

Specifications:

Your python: defaults/osx-64::python==3.10.9=h218abb5_2

This has even caused a segmentation fault, as mentioned in issue #610 , This is why, I am not able to fetch this particular model eos9be7, which requires lower versions of Python, using CLI. I'll try fetching it using Google Colab and work on this CLI Issue, as well!

girishatechie commented 1 year ago

For the Google Colab Implementation, I was able to find the template in /notebooks and I have downloaded the eml_canonical.csv file on my PC, and added it to MyDrive, which was imported on the Colab. I have successfully installed Ersilia on Colab, connected it with my Google Drive, as well as installed all the dependencies. I have specified all the file paths in CSV format, as well as fetched the eos9be7 model successfuly! Initially, I was facing a File Not Found Error, which I resolved successfully by giving the full path to the file and adding the CSV file to MyDrive, after downloading and saving it from the repository. I successfully extracted the smiles to a list, the output showed that - My Dataset contains 442 Smiles. Now, I will be predicting and testing the model further and I'll report it!

Output of the Cell:

Fetching eos9be7 done in time: 0:15:39.025712s 👍 Model eos9be7 fetched successfully! Time taken: 941.91 seconds

Finally, after a lot of attempts, I was able to fetch the model eos9be7 successfully, on the Colab!

girishatechie commented 1 year ago

I navigated through the Ersilia Model Hub Website, and so far I've been able to study this model eos9be7. I discovered that the output should be of the type float, and the input needs to be in the shape of a pair of lists. A pair of lists will be given as a single input to make predictions, since it involves finding the Chemnet-Distance between 2 Molecules, for the same single input. We will get a single Output. Now, I will be performing the predictions further and I'll report it further and resolve the errors, if any arise, as well as identify their source.

girishatechie commented 1 year ago

I have served the Model correctly as follows: PID: 50962 SRV: conda

👉 Available APIs:

💁 Information:

girishatechie commented 1 year ago

Hi @GemmaTuron @DhanshreeA ! I fetched as well as served the model successfully using Google Colab. However, while testing/running it using Calculate API, I am getting the following Assertion Error:

AssertionError Traceback (most recent call last) in 7 model = ErsiliaModel(model_name) 8 begin = time.time() ----> 9 output = model.api(input=smiles, output="pandas") 10 end = time.time() 11

11 frames /usr/local/lib/python3.7/site-packages/ersilia/io/readers/pyinput.py in is_single_input(self) 43 if type(one_element) is tuple: 44 one_element = list(one_element) ---> 45 assert type(one_element) is list 46 one_inner_element = one_element[0] 47 if type(one_inner_element) is tuple:

AssertionError:

After various attempts, I have figured out the Source of Error. As I had mentioned in the comment above, This Model's Characteristics (as mentioned in the model's repository) require the Input to be a Compound and the Input shape to be a Pair of Lists of Molecules, in order to measure the Chemnet-Distance. However, referring to the eml_canonical.csv file that I had downloaded from the Notebook Template Repository, only 1 input is being provided, i.e smiles.

Since, This is a model using two sets of molecules and returning an overall single output (one float number) between the two sets, This error will be resolved if we we use a Compound Pair of Lists of Molecules in CSV Format, as follows - an example from the GitBook: (a CSV file with 2 Columns -)

smiles_1,smiles_2 CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O,CC(=O)OC1=CC=CC=C1C(=O)O C1=CN=CC=C1C(=O)NN,CC(C)CC1=CC=C(C=C1)C(C)C(=O)O CC(CN1C=NC2=C(N=CN=C21)N)OCP(=O)(O)O,CC1(OC2C(OC(C2O1)(C#N)C3=CC=C4N3N=CN=C4N)CO)C ,COC1=CC23CCCN2CCC4=CC5=C(C=C4C3C1O)OCO5

girishatechie commented 1 year ago

Update on Fetching and Testing the Model via CLI:

I was able to gain much more clarity and identify the sources of the error, that I was facing! :))

I went through the Dockerfile of the Model eos9be7, which was there in its repo, to check the dependencies of this model, and I came to know that it requires Python 3.7, which is incompatible with the current python version on my system. As stated in the Dockerfile:

FROM bentoml/model-server:0.11.0-py37 MAINTAINER ersilia

RUN conda install -c conda-forge rdkit=2021.03.4 RUN pip install fcd

WORKDIR /repo COPY . /repo

That's why, it was causing errors while I was trying to fetch the model using CLI on my M1 Mac System, even though I was able to successfully fetch, serve and test it on Colab. Now, in order to resolve this, I will follow the pointers mentioned in issue #610 . I went through them and I've understood the details. I would have to overwrite the string py37 to py310, while working with such models, whose python version in the Dockerfile is below 3.10 and then Ersilia will install this model in a conda environment based on Python 3.10 instead of 3.7.

AdedejiAdewole commented 1 year ago

Hello, I was having the same issues and I realised that the models were developed using python 3.7 which is incompatible on my Mac intel chip, have you been able to overwrite the string? if you have, can you show me how please?

girishatechie commented 1 year ago

Hey @AdedejiAdewole ! Yes Sure! :) Basically, when you would have installed the model on which you're working, in your system, you would get a folder created for that model, by the name of the Model's Identifier, in a particular directory (Containing all the required files for the Model). You need to access the Model's Dockerfile (I found that after searching for the model on which I am working, on the Ersilia Model Hub Website, by using its identifier, and then I could find the repo for the model, which contained the Dockerfile as well) and then overwrite py37 (which I've shown in bold format in the above comment, from the Dockerfile) to py310, save it, and then it should work!

AdedejiAdewole commented 1 year ago

Okay, I'll try that. Thank you.

AdedejiAdewole commented 1 year ago

Hello, is this the dockerfile you're referring to? Got this from the Eos folder.

Screenshot 2023-03-12 at 18 59 18
girishatechie commented 1 year ago

Yes this one 👍🏻 all the dependencies should be listed there 👍🏻

GemmaTuron commented 1 year ago

Hello, I was having the same issues and I realised that the models were developed using python 3.7 which is incompatible on my Mac intel chip, have you been able to overwrite the string? if you have, can you show me how please?

Hi @AdedejiAdewole as far as I know, the intel chips do not have Py3.7 incompatibilities, It is only for the Apple Silicon chips (M1 and M2)

GemmaTuron commented 1 year ago

Update on Fetching and Testing the Model via CLI:

I was able to gain much more clarity and identify the sources of the error, that I was facing! :))

I went through the Dockerfile of the Model eos9be7, which was there in its repo, to check the dependencies of this model, and I came to know that it requires Python 3.7, which is incompatible with the current python version on my system. As stated in the Dockerfile:

FROM bentoml/model-server:0.11.0-py37 MAINTAINER ersilia

RUN conda install -c conda-forge rdkit=2021.03.4 RUN pip install fcd

WORKDIR /repo COPY . /repo

That's why, it was causing errors while I was trying to fetch the model using CLI on my M1 Mac System, even though I was able to successfully fetch, serve and test it on Colab. Now, in order to resolve this, I will follow the pointers mentioned in issue #610 . I went through them and I've understood the details. I would have to overwrite the string py37 to py310, while working with such models, whose python version in the Dockerfile is below 3.10 and then Ersilia will install this model in a conda environment based on Python 3.10 instead of 3.7.

Hi @girishatechie

This is great thanks for the explanations. Did you successfully run the model with the updated py3.10?

girishatechie commented 1 year ago

Hi @GemmaTuron ! Yes, I was able to run the model using CLI with the updated version and Google Colab! I've listed my findings from the test, in the above comments, Thank you so much! :)

girishatechie commented 1 year ago

Update on the Week 2's Tasks:

Hi @GemmaTuron :)

I have selected the STOUT (Smiles to IUPAC) Model from the Suggested List. I found all the Models to be very interesting and intriguing, but I have selected this one for the following reasons:

  1. It involves a deep-learning neural machine translation approach, as stated in the publication. I have huge interest and inclination towards Deep Learning and I have some experience, working with it. It can generate the IUPAC Name of a Molecule, from its SMILES String and vice-versa, which is extremely important, since the SMILES representation is designed to be interpreted by Machines, while IUPAC Nomenclature of Molecules is much more accepted and is readable by humans.
  2. I was really fascinated by the application as well as the approach used, in this Model. This will make IUPAC Name Generation of Molecules, much more easier and convenient. Application of Neural Machine Translation, is indeed one of the best approaches towards this use-case, and interests me the most.
  3. I have deep interest towards IUPAC Nomenclature and how it works, as well as the Chemistry behind it. Most of this Process and IUPAC Name Generation has been automated using STOUT, and gives out high quality data, wherein Models are trained in lesser time, as compared to conventional approaches. This is indeed a very good advantage, and this is what fascinated me the most.

Working on this Model will definitely be a great learning curve, for me!

I am really excited to work on this Model, further! :)

girishatechie commented 1 year ago

I have successfully installed the model, on my system!

Steps Followed:

At first, I was using PyPi for installing STOUT, but I wasn't able to install it successfully, due to a few incompatibilities and dependency resolution errors. Hence, I installed it straight from the Repository using:

pip install git+https://github.com/Kohulan/Smiles-TO-iUpac-Translator.git

This successfully installed STOUT, in my system. I was even able to setup the Conda Environment. But, I downloaded it straight from the repository, instead.

  1. The Tensorflow version on my MacOS was a bit outdated and was causing issues, so I checked the Model's Requirements and installed Tensorflow 2.10.1, to be able to successfully run the Application.
  2. This Model requires Python Versions greater than 3.8, which wasn't an issue and was compatible with the Python Version, on my System.
  3. I had to install Pytest, manually, which was needed to run the application. Command Used: pip3 install pytest (it's only compatible with updated versions of python3 and pip3)
  4. Pystow, unicodedata2, jpype1 were successfully installed, these are required to run the application correctly, as mentioned in the Repo.

I will now run predictions on this model, for the Essential Medicines List (EML) :)

girishatechie commented 1 year ago

Update:

Hi @GemmaTuron !

I had successfully installed STOUT but when I am trying to run predictions for EML using the original source code (in the original repo) , I am constantly getting the following error:

zsh: illegal hardware instruction python3

I have been attempting this for a lot of hours, I tried searching more about it and came to know that it might be due to a Tensorflow Package Version Incompatibility on my M1 Mac, which is a very common case for all the Macs. I tried to resolve it by re-installing Tensorflow by Creating a new virtual Python Environment and I even tried to re-install it via Mac Ports, but it didn't suffice. Later, I tried re-installing Tensorflow using a Conda Environment, but It was conflicting with the one installed earlier on my system (2.10.1) and I wasn't able to resolve the error.

I tried running the Python Script many times, which had the original source code, but it showed the same error repeatedly, mainly at the step wherein I tried importing from STOUT, which was already installed successfully. I am guessing this is due to some version incompatibilities of the package. I've been trying to resolve it since long, but it's still showing this error.

Could you please help me out in resolving this? It would be a great help! Thank you!

GemmaTuron commented 1 year ago

Hi @girishatechie

Can you try to run on bash instead of zsh? Can you provide more information about the versions that are incompatible with M1? also, it would be great to know if you are able to run this model from the Ersilia CLI, see if you run into the same issues

girishatechie commented 1 year ago

Hi @GemmaTuron ! Yes definitely, I will try running it on bash now. Thank you!

About the incompatibilities, I went through the Tensorflow Official Repository, and I came to know that most of its versions don't support the M1 Chip. Apple has suggested many alternatives to install and use Tensorflow, like Metal Plugins, Mac Ports or by using the Conda Environment, all of which should be compatible with Python Versions > 3.8, I tried implementing them all, but I was still getting the same error. This Tensorflow issue is generally reported only with M1 Macs and there haven't been any code changes/updates on it, yet. It basically means that my binaries contain instructions that my version of MacOS isn't able to understand.

I will try running it from the Ersilia CLI and give the updates! So Far, I have been able to fetch and serve it successfully (eos4se9) , using Ersilia CLI. I am onto Running the Model, now!

GemmaTuron commented 1 year ago

Ah Mac M1 chips are abit of a nightmare since they don't support many versions that work in Linux. Let me know if the Ersilia implementation of the model works, thanks

AdedejiAdewole commented 1 year ago

Yes even intel chips, using PyPi to install STOUT didn't work but it worked using "pip install git+https://github.com/Kohulan/Smiles-TO-iUpac-Translator.git" though but Erisilia implementation still doesn't work so I had to use google colab to implement the Ersilia models since you suggested that. I hope the issue is fixed in the future though.

girishatechie commented 1 year ago

Hi @GemmaTuron ! :)

Update:

I tried running it on bash instead of zsh, but I got the similar error: Illegal Instruction 4. To resolve it, I tried re-installing Tensorflow using Conda Environment, by using:

conda install -c conda-forge tensorflow

It took a few hours to execute and was checking for system incompatibilities and conflicts, but didn't give the desired results.

For the Ersilia Implementation, I tried it using Ersilia CLI, I was able to fetch and serve it successfully initially, but later I navigated through the log file and got to know that it wasn't fetched properly and upon re-trying, it raised the following error:

ERROR | Ersilia exception class: ModelPackageInstallError

Detailed error: Error occured while installing package by running "bash /var/folders/w8/p6bb0jx94jn68dfqp202spxw0000gn/T/ersilia-xtk3d0q8/script.sh > /var/folders/w8/p6bb0jx94jn68dfqp202spxw0000gn/T/ersilia-nb96cct0/command_outputs.log 2>&1" command

I tried to implement again by activating the Ersilia environment manually first, then I tried to iterate over all the smiles in the .csv file given, (containing smiles and can_smiles), by activating the bash and using for loop to accept multiple inputs in the shell script, it took almost 4-5 hours to finish executing and upon completion, it showed the same error, indicating that the model wasn't fetched or served successfully.

The Google Colab Implementation (Model: eos4se9), which took 9 hours to run the predictions, works perfectly fine for both the translations and I am able to fetch as well as run the model, without any errors or exceptions.

I am trying to look for more solutions on the above issues. Thank you!

GemmaTuron commented 1 year ago

Thanks @girishatechie ! Can you provide the full log file of the above error when running Ersilia?

Meanwhile, let's focus on testing the NCATS model Human cytosol Liver Stability if you have time left this week

Thanks

girishatechie commented 1 year ago

Sure, I am on it! @GemmaTuron

I've attached the log file here, Thank you! newstouttesting.log

girishatechie commented 1 year ago

Update:

I read about the NCATS Human Liver Cytosol Stability Model, Found it to be really interesting, specially because it is a Classification Model and involves a consensus of Random Forest Models and the Models are built using scikit-learn library. Both of these areas, genuinely interest me! :)

I have successfully installed the Model, in my system, using: (I referred to the Development Branch of the Repository, and manually downloaded the Models, as well as placed them manually in the directory, since the links in the Main Branch seemed to be outdated) git clone --recursive https://github.com/ncats/ncats-adme.git

Further, I changed the working directory, to where I have ADME-RLM in my system by: cd ncats-adme I switched to the server directory, by using: cd server

Further, I created the environment by following:

conda env create --prefix ./env -f environment_mac.yml

I'll further check for the additional packages to be installed and run the HLC Stability Model for the EML.

girishatechie commented 1 year ago

Hi @GemmaTuron !

Update:

I tried to run the Model (using the original source code) by running the app.py file script, in the CLI. I got a connection error first, so I tried disabling the firewall and changing my Internet connection, that error got resolved but then it showed another error, indicating some problem with the Tensorflow Backend Integration.

I tried searching more about such errros, particular for my ARM64 MacOS System, I came across a few resources and went through them, and I've finally been able to identify this issue, the same issue was causing errors while implementing the previous model as well.

I've made a note of some steps to be followed to resolve this issue, going through a bunch of resources. I've installed Miniforge on my system and I further activated a new virtual system environment using Conda. I referred to the Apple Official's Repository, they've given a zipped tar file in their releases, which needs to be downloaded from Apple/tensorflow_macos. I've downloaded it, and I need to setup the variables further, in the working directory of ARM64 and follow the next few steps, to solve the Tensorflow Issue. I am on it and I've set the variables so far, and once I am done with it, I'll list down the steps here in a detailed manner, as well as the errors faced in between, and test the NCATS model, further. Thank you! :)

GemmaTuron commented 1 year ago

Thanks @girishatechie , looking forward to seeing the explanations, this will be very helpful to improve our usability in MacOS systems!

girishatechie commented 1 year ago

I'm on it! @GemmaTuron Thank you! :))

girishatechie commented 1 year ago

Hi @GemmaTuron ! I am extremely sorry for the delay in updating my issue, this won't happen again! Thank you!

Update: I have successfully resolved the issue that was causing errors while running and implementing the Model. It was due to a Tensorflow Backend Issue, as stated in the previous comments, and It repeatedly gave the same error for most of the models, i.e- Illegal Instruction: 4

I went through a couple of articles and discussions over the internet, along with some other such Github Issues, explaining the same error. I followed a lot of them, but I was still getting a similar error. It was mainly caused because Tensorflow couldn't be correctly imported/downloaded on my M1 Mac. Finally, after a lot of attempts, I solved it by GPU-Acceleration, since processing the ML Models on the GPU, implies shorter training time. I followed these steps:

  1. Verifying the installation of Xcode Command Line Tools on my Mac.
  2. I downloaded the Latest Miniconda3 macOS Apple M1 ARM 64-bit pkg and installed it with Python 3.9, with default configurations.
  3. I created a new, clean virtual environment, by: conda create --name tf python=3.9

I further activated it, using: conda activate tf

  1. In the same tf environment, I installed the Tensorflow-deps package: conda install -c apple tensorflow-deps

  2. I further installed Base Tensorflow: pip install tensorflow-macos

  3. Finally, I installed the Tensorflow Metal Plug-in: pip install tensorflow-metal

  4. I upgraded my Numpy Package, using pip.

  5. I verified the Tensorflow Installation by importing it: import tensorflow as tf

I further tried to use and import it, in the same tf environment, that was activated using Conda. Initially, I got a Module Not Found Error. But, I resolved it by updating and upgrading my Mac from Monterey to MacOS Ventura 13.2.1, this took a few hours. Further, I did not get the Illegal Instruction Error and I was able to run the model! :)

girishatechie commented 1 year ago

While Running python app.py for the NCATS-HLC Stability Model, I no longer got the above error because the Tensorflow Backend Issue was resolved, but I was getting Module Not Found Errors for a few Modules like Flask, Keras, Flask-Cors, rdkit, ipython, etc. So, I created a new virtual environment using Python3 and I further activated it and I installed all these required dependencies manually, using pip, in that environment itself, referring to the environment_mac.yml file, in the Github Repo of the Model. I also downloaded the Model, manually, referring to the Repository of the Model.

I followed the steps given to run the application further, and I got this output: HLC_Stability_Predictions.csv

girishatechie commented 1 year ago

What I inferred from the Output of the Human Liver Cytosolic Stability Model, as well as going through the Model's Documentation: This is a Classification-type Model. It provides Predicted Class, for a given compound. If the Predicted Class is 1, it means that the Compound is predicted as Unstable, and if the Predicted Class is 0, it means that the Compound is predicted as Stable. In the above output file, all the compounds, corresponding to the can_smiles input, are predicted as stable, having a Probability Score ~ 0. Their in vitro half life is > 30 minutes.

GemmaTuron commented 1 year ago

Hi @girishatechie !

Good job, thanks! Can you try to run the model from the Ersilia Model Hub and check if the output correlates with the original NCATS model? the model is eos9yy1 Then we'll be ready to move onto week 3 contributions!

girishatechie commented 1 year ago

Sure, I am on it! :) @GemmaTuron Thank you!

girishatechie commented 1 year ago

Hi @GemmaTuron ! :)

Update:

I tried to run the eos9yy1 model, from the Ersilia Model Hub. While fetching it on my CLI, I got a Model Package Install Error. I tried to navigate through the entire Log File, and after some Debugging, I was able to figure out the source of error, which is causing problem in installing it, using CLI. While running the Model Package Install Commands Script, it raises the following error, which is mainly creating issues in installing, fetching and implementing this model, using my CLI:

Looking in links: https://download.pytorch.org/whl/torch_stable.html ERROR: Could not find a version that satisfies the requirement torch==1.6.0+cpu (from versions: 0.4.1, 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.1.0.post2, 1.2.0, 1.3.0, 1.3.0.post2, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1) ERROR: No matching distribution found for torch==1.6.0+cpu

I searched for possible solutions to this error, and realised that this is a common error. I tried to implement those solutions, I came across PyTorch's official website, and tried to run the commands given by them, to install the previous versions of PyTorch, manually. I tried to run: pip install torch==1.6.0 torchvision==0.7.0

This was the command given by them for OSX. This successfully installed torch==1.6.0, on my system but upon the Ersilia Model Implementation, It still gave the same error as above, indicating that this PyTorch's version, couldn't be matched on any existing version, on my Mac, mainly because it's one of the previous versions. I even tried downgrading to Python3.7, by creating a new virtual environment, since some Github Issues on the internet pointed out that this is a previous version of PyTorch and hence, it's not compatible with Python Versions > 3.8. But still, upon implementing the Model, I was getting the same error. It's not able to install the torch==1.6.0+cpu version, which is required for the Model Install and Implementation, using CLI.

girishatechie commented 1 year ago

However, I am able to successfully execute the Ersilia Model Implementation of eos9yy1, on Colab. I tried to run it for a .csv file with 5 smiles as input, initially and later on, executed it for the whole eml_canonical.csv file as input, for the entire EML.

Upon comparing this output with the output of the original NCATS Model, I found out that, both the outputs are same. For the original NCATS Model, the Predicted Class Output for all the inputs was approximately ~ 0, indicating that they were stable. Similarly, for the Ersilia Model Implementation, the Predicted Class was approximately ~ 0, for all the inputs. There was some difference in the digits at the decimal places, for the Probability Score, for both the implementations, however the final Predicted Class Output is approximately ~ 0, for all the inputs.

Hence, the Output for the eos9yy1 Model from the Ersilia Model Hub, correlates with the original NCATS Model, and both give similar outputs.

GemmaTuron commented 1 year ago

Hi @girishatechie

Thanks, as you might read in the issues of the repo for eos9yy1 someone else also faced the MacOS compatibility problems with pytorch... we'll have a deeper look. Let's move onto week 3 tasks!

girishatechie commented 1 year ago

Hi @GemmaTuron ! Thank you so much! Onto Week 3's Tasks :)

girishatechie commented 1 year ago

MODEL

AGMI: Attention-Guided Multi-omics Integration for Drug Response Prediction with Graph Neural Networks

SLUG

agmi-drp

PUBLICATION

https://arxiv.org/abs/2112.08366v2

SOURCE CODE

https://github.com/yivan-wyygdsg/agmi

DESCRIPTION

This paper presents a novel Attention-Guided Multi-omics Integration (AGMI) approach for Drug Response Prediction (DRP), which first constructs a Multiedge Graph (MeG) for each cell line, and then aggregates multi-omics features to predict drug response using a novel structure, called Graph edge-aware Network (GeNet).

REQUIREMENTS

Conda env Python 3.8 Matplotlib Jinja2 TensorBoardX MarkUpSafe Numpy Pandas Protobuf PyTorch=1.11.0 mmcv-full==1.3.10 Cuda 11.3

SUMMARY AND RELEVANCE TO ERSILIA'S MISSION

Precision medicine aims to tailor more effective diagnostic and anti-cancer therapy to each individual patient. Accurate Drug Response Prediction is an important, yet challenging task in Precision Medicine, for predicting the response of patients, to a drug. The AGMI approach explores gene constraint based multi-omics integration for DRP with the whole-genome using GNNs. AGMI largely outperforms state-of-the-art DRP methods by 8.3%–34.2% on four metrics. Other than this, adding more Omics data improves the model's predictive power. Hence, developing an effective multi-omics integration is essential for more accurate drug response prediction. This paper proposes a novel Attention-Guided Multi-omics Integration (AGMI) approach for DRP, which is very effective. AGMI integrates multi-omics data by modeling a cell line as a graph with multiple types of edges and adapts MPNNs by introducing a node-level GRU and a graph-level GRU to capture the complex features of MeG, especially being aware of the features of multiple types of edges. Besides, GeNet adopts a Graph Isomorphism Network (GIN) to generate a drug feature vector and concatenate it with a cell line feature vector for final prediction.

TASK

Regression

LICENSE

None

TAGS

IC50 Cancer

GemmaTuron commented 1 year ago

Hi @girishatechie !

Thansk for finding this publication. The issue is that, as you have seen, it is focused on cancer, which is currently our of scope for our mission. We'd love to have more -omics models in the infectious disease space, but the data is much more sparse and there are very few large scale assays such as the screenings done in cancer.

girishatechie commented 1 year ago

Noted! @GemmaTuron Thank you so much for the feedback! Searching for better and more relevant Models!

girishatechie commented 1 year ago

MODEL

DyScore: A Boosting Scoring Method with Dynamic Properties for Identifying True Binders and Non-binders in Structure-based Drug Discovery

SLUG

dyscore

PUBLICATION

https://pubs.acs.org/doi/10.1021/acs.jcim.2c00926

SOURCE CODE

https://github.com/YanjunLi-CS/dyscore

LICENSE

None

DESCRIPTION

DyScore is a Boosting Scoring Method with Dynamic Properties for Identifying True Binders and Non-binders in Structure-Based Drug Discovery.

SUMMARY

The accurate prediction of protein–ligand binding affinity is critical for the success of computer-aided drug discovery. However, the accuracy of current scoring functions is usually unsatisfactory due to their rough approximation or sometimes even omittance of many factors involved in protein–ligand binding. In this study, two novel features were proposed for characterizing the dynamic properties of protein–ligand binding based on the static structure of the complex, which is expected to be a valuable complement to the current scoring functions. The two features demonstrate the geometry-shape matching between a protein and a ligand as well as the dynamic stability of protein–ligand binding. These two novel features were further combined with several classical scoring functions to develop a binary classification model called DyScore that uses the Extreme Gradient Boosting algorithm to classify compound poses as binders or non-binders. The interaction-based DyScore model has better performance on EF1% compared with all other methods. The similarity-based DyScore-MF model trained with additional fingerprint information shows even higher performance than DyScore.

TAGS

Similarity Fingerprint

TASK

Classification

REQUIREMENTS

Docker

WORKING

DyScore consists of several steps:

  1. Molecular Docking: [Optional] Dock a ligand to a target protein to identify the ligand binding conformation.
  2. Data Processing: Generate the static and dynamic features for the protein-ligand complex.
  3. Model Prediction: Predict the likelihood of whether a given compound is a true binder.
GemmaTuron commented 1 year ago

Hi @girishatechie !

I really like how the authors accurately describe the steps to run their model, docking is typically complex but this seems well implemented - I've referenced it to @lynden95 for checking for his project

Unfortunately, we cannot add it in the Hub at this moment, since due to its complexity, requiring docker images etc, we are not ready for it! Thanks!

girishatechie commented 1 year ago

Hi @GemmaTuron ! Noted! Thank you so much! :)

girishatechie commented 1 year ago

MODEL

DeepDrug: A general graph-based deep learning framework for drug-drug interactions and drug-target interactions prediction

SLUG

deepdrug-ddi-dti

PUBLICATION

https://www.biorxiv.org/content/10.1101/2020.11.09.375626v2

SOURCE CODE

https://github.com/wanwenzeng/deepdrug

LICENSE

None

DESCRIPTION

DeepDrug is a deep learning framework, using residual graph convolutional networks (RGCNs) and convolutional networks (CNNs) to learn the comprehensive structural and sequential representations of drugs and proteins in order to boost the drug-drug interactions(DDIs) and drug-target interactions(DTIs) prediction accuracy.

SUMMARY

DeepDrug is a Deep Learning Framework that uses residual graph convolutional networks (RGCNs) and convolutional networks (CNNs) to learn the comprehensive structural and sequential representations of drugs and proteins in order to boost the DDIs and DTIs prediction accuracy. The exploration for biomedical interactions between chemical compounds (drugs, molecules) and protein targets is of great significance for Drug Discovery. Drugs interact with biological systems by binding to protein targets and affecting their downstream activity. Prediction of Drug-Target Interactions (DTIs) is thus important for identification of therapeutic targets or characteristics of drug targets. Drug Interactions (DDIs) can sometimes reveal potential synergies in drug combinations to improve the therapeutic efficacy of individual drugs. Negative DDIs are major causes of Adverse Drug Reactions. Computational approaches for accurate prediction of drug interactions, such as drug-drug interactions (DDIs) and drug-target interactions (DTIs), are highly demanded for biochemical researchers due to the efficiency and cost-effectiveness. DeepDrug outperforms state-of-the-art methods in terms of both accuracy and robustness in predicting DDIs and DTIs with multiple experimental settings. The methods are implemented in a series of systematic experiments, including binary-class DDIs, multi-class/multi-label DDIs, binary-class DTIs classification and DTIs regression tasks using several datasets. In this case, Biochemical interactions are primarily determined by both the sequence and structure of the participating entities. Therefore, the performance of the predictive model ultimately depends on the accurate characterisation of the sequential and structural information.

REQUIREMENTS

PyTorch

TAGS

Similarity SARS-CoV-2 DrugBank

TASKS

Classification/Regression

GemmaTuron commented 1 year ago

Hi @girishatechie !

Good model, it won't be ready to incorporte in the Hub yet since it requires quite complex input files (protein structures), but can we add it to the list so that we can have it in the future? Can you share more about the tags selected? Is there a pretrained model for covid?

girishatechie commented 1 year ago

Hi @GemmaTuron ! Noted! Thank you! Do I need to add it to the list of Model Suggestions? About the tags selected, DeepDrug was ultimately applied to perform drug repositioning on the whole DrugBank database to discover the potential drug candidates against SARS-CoV-2, where 3 out of 5 top-ranked drugs were reported to be repurposed to potentially treat COVID-19. These results of drug repositioning for SARS-CoV-2 evidently suggest that DeepDrug can be a useful tool for effectively predicting DDIs and DTIs and greatly facilitate the drug discovery process. This is one of the applications of DeepDrug, which indicates its predictive power. Drug Similarity was measured based on the calculated topological fingerprints of two drugs, and similarly, Protein Similarity was measured for each protein in the BindingDB Dataset. Based on these results, a precise training dataset for SARS-CoV-2 potential drug prediction was built. Further, DeepDrug DTI models were used, pretrained on the BindingDB dataset to predict the binding affinity of the drug-target pairs, as constructed. Five-fold cross-validation pretrained models are used to make binding affinity predictions and final prediction for each pair is obtained by taking the mean and maximum values of the predictions for these 5 models. The performance of DeepDrug in each cross-validation fold is recorded, for the final affinity scores. DeepDrug was able to correctly identify the interactions of SARS-CoV-2 proteins. Based on the results, DeepDrug may provide therapeutic opportunities against newly found proteins such as SARS-CoV-2.

GemmaTuron commented 1 year ago

Hi @girishatechie !

thanks, let's add it to the model list! I'm interested to understand if the pretrained covid model would be easy to implement. Could you give it a try and let me know what are the inputs required for the model, and what is given as output?