Kadeniyi23 commented 11 months ago

Week 1 - Get to know the community

[X] Join the communication channels
[X] Open a GitHub issue (this one!)
[x] Install the Ersilia Model Hub and test the simplest model
[x] Write a motivation statement to work at Ersilia
[x] Submit your first contribution to the Outreachy site

Week 2 - Install and run an ML model

[x] Select a model from the suggested list
[x] Install the model in your system
[x] Run predictions for the EML
[x] Compare results with the Ersilia Model Hub implementation!
[x] Install and run Docker!

Week 3 - Propose new models

[x] Suggest a new model and document it (1)
[x] Suggest a new model and document it (2)
[x] Suggest a new model and document it (3)

Week 4 - Prepare your final application

[x] Submit the final application in the Outreachy website

carcablop commented 11 months ago

Hello @Kadeniyi23 Welcome to Ersilia. Be sure to complete the installation steps and run a model. The complete guide can be found here: https://ersilia.gitbook.io/ersilia-book/ersilia-model-hub/installation and report here your progress. Thanks.

Kadeniyi23 commented 11 months ago

Week 1

Whew! Just figured out we're meant to comment the milestone.

1st Task: Joining the communication channels

On the 3rd of October i Introduced myself to the #general channel on slack.

Kadeniyi23 commented 11 months ago

2nd Task:Creating an issue

I Successfully created an issue on the 3rd of October 💪

Kadeniyi23 commented 11 months ago

3rd Task: Install the Ersilia Model Hub and test the simplest model

The instructions to the the third task are listed here

For the first stage of installation I installed WSL as I am using a Windows Operating system and not Linux. This ran smoothly
Next I installed the GCC compiler using the below code: sudo apt install build-essential I used the Windows subsystem for Linux instead of the Ubuntu terminal as my Ubuntu terminal was not working.

After multiple tries i used the WSL Terminal instead.

I installed Miniconda on the WSL Terminal and it was successful. 4.I successfully installed the Github CLI
I successfully installed the Git LFS from Conda
The Git LFS was installed and initialized

To activate conda activate ersilia
to install the Ersilia package from github git clone https://github.com/ersilia-os/ersilia.git cd ersilia pip install -e .
Next I installed lsaura data lake -version 1 using the following code python -m pip install isaura==0.1

Installing Ersilia

To set up Ersilia I used the following code to set up the Ersilia Environment conda create -n ersilia
Yipee Ersilia installed . I viewed the CLI options with help ersilia --help

To test a model with Ersilia

Fetching the Model: Eos3b5e Molecular weight. The output for the following code ersilia -v fetch eos3b5e > princed.log 2>&1 is princed.log
Serving The model To serve the model eos3b5e, I used the following code ersilia serve eos3b5e > serving_molecular_weight.log 2>&1 , to reach an output of serving_molecular_weight.log
Running the Model To run the model, I used the below code ersilia -v run -i "CCCC" > running_molecular_weight.log 2>&1.

The output: running_molecular_weight.log

It yielded a type error TypeError: object of type 'NoneType' has no len()

This also aligns with similar problems being faced here

Also using the code ersilia -v api calculate -i "CCCC" yielded a Key Error as shown below

Following this suggestion here, i changed the base code in the file file.py from if len(h) == 1: to if h is not None and len(h) == 1:.

After the following the suggestion and running the code

ersilia -v run -i "CCCC" > running_molecular_weight_2.log 2>&1

I got the expected output logged

running_molecular_weight_2 (1).log

Following the suggestion of @DhanshreeA here I set up the Conda environment again using python 3.7 and reinstalled Ersilia.

Fetching the model eos3b5e with the following code ersilia -v fetch eos3b5e > fetching_molecular_weight.log 2>&1 yielded the following output fetching_molecular_weight.log
Serving the model with the following code ersilia serve eos3b5e > serving_molecular_weight2.log 2>&1 yielded the following output serving_molecular_weight2.log
Running the model with the following code ersilia -v run -i "CCCC" > running_molecular_weight3.log 2>&1 yielded the following output running_molecular_weight3.log

I was able to successfully get the expected output after reinstalling Ersilia and running the model eos3b5e

DhanshreeA commented 11 months ago

Thank you for the updates @Kadeniyi23. A quick feedback: You do not need to modify the code within ersilia repository if you run into this error. The correct command to run an api is as you have mentioned above ersilia -v run -i <input>. It is because of this command that you got the correct output (and not because of updating the ersilia code).

Kadeniyi23 commented 11 months ago

Yes I believe so too. I tried it after modifying the code but it came out with an error, but after I reinstalled Ersilia with python 3.7 I was able to get the expected output. Thanks for your feedback

Kadeniyi23 commented 11 months ago

Fourth task:

Motivation Statement: Why I Joined Outreachy and wish to work at Ersilia 🚀

My name is Adeniyi Kabirat. As a data scientist and an aspiring AI/ML engineer, I worked with a few data models in the past, ranging from building a highly intricate recommendation system to building machine learning models in hackathons, but I joined Outreachy to be able to contribute to open source. Working with open source has not particularly been a dream of mine since I started my journey in data science in 2020, but along the way, I came to learn that a lot of open source programs and companies truly help change the world. And I really wanted to be a part of that. Hence, I applied for the Outreachy program. When picking the programs to contribute to after being picked as an applicant, Ersilia was the one program that stood out to me. A company that creates AI/ML models for biomedical research. Sign me up! Given my background—a bachelor's degree in microbiology—and my history of data science, I believe I would be a true asset to the Ersilia team. My current skills include proficiency in Python, R, and Conda. While I haven't had much experience with Docker, I have been involved with a few side projects that have utilized the platform.

Joining the Ersilia community provides me with an avenue to join a meaningful program that aims to bridge gaps that should not exist. A particular goal that aligns with mine is Ersilia, supporting research on infectious and neglected diseases in low-income countries. Being from a low-income country myself, I have seen the effects of infectious disease in a community, and a company that makes that a goal is one I will be delighted to work with.

Being picked as an applicant and eventually as an intern provides me with an opportunity to contribute to a community and workspace that prioritizes growth and provides easy and open access to AI/ML models and research. My time spent as an intern would be one spent growing and learning, building and budding an experience with Python, Docker, and Conda, and contributing to a team that seeks to provide medical solutions worldwide. I would be fully immersed in an AI/ML project while collaborating with minds worldwide to seek a solution to a problem. Post-internship, I hope to be able to come out the other side with more well-rounded knowledge in AI and ML, adding to the Ersilia team as a whole and contributing more to open-source programs. Furthermore, I am eager to gain hands-on experience in implementing AI and ML algorithms and techniques and understand how they can be applied in the healthcare industry. This internship would also provide me with the opportunity to enhance my problem-solving skills and learn from experienced professionals in the field, ultimately preparing me for a successful career in AI/ML research and development.

Kadeniyi23 commented 11 months ago

Fifth Task: Submit your first contribution to the Outreachy site

I have submitted my initial contribution to the Outreachy website

Kadeniyi23 commented 10 months ago

Week 2

First Task: Selecting a model from the suggested list.

After going through the suggested models, I selected the STOUT (SMILES to IUPAC). I selected the model after reading the publication here

I selected the model for two major reasons:

The methodology used was an NMT( a neural machine translator), an interesting algorithm that I have always found fascinating. NMT, or neural machine translation, is a cutting-edge approach that has revolutionized language translation. Its ability to learn and adapt from vast amounts of data sets it apart from traditional translation methods. With its complex neural networks, NMT has the potential to greatly improve the accuracy and fluency of translations, making it an exciting field of research in the realm of artificial intelligence.
I always had problems deriving the IUPAC name for a chemical compound in secondary school. A problem that I figured a lot of students share .An ML algorithm that can do that is fascinating. With STOUT, an environment is created for easy translation of SMILES to IUPAC names, rendering the process much simpler and accessible for students and researchers alike. This advancement in machine learning not only saves time and effort but also ensures accuracy and consistency in chemical nomenclature, revolutionizing the way we approach organic chemistry. 👍

Kadeniyi23 commented 10 months ago

Second Task: Installing the model in system.

Step 1: Following the instructions on the github page, I downloaded MIniconda3 on my Linux with the code wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Step 2: To install Miniconda, I ran the following code : bash Miniconda3-latest-Linux-x86_64.sh

Step 3: To activate Miniconda and test for the version on Miniconda source ~/.bashrc conda --version

Step 4: Installing STOUT

To create an environment called STOUT conda create --name STOUT python=3.8
Activating the environment conda activate STOUT
To install the package pypi conda install -c decimer stout-pypi

Installation Error ❌

When I ran the conda install -c decimer stout-pypi code it presented the error. The error log is installation_error.log


Output in format: Requested package -> Available versions The following specifications were found to be incompatible with your system:

  - feature:/linux-64::__glibc==2.36=0
  - feature:|@/linux-64::__glibc==2.36=0

Your installed version is: 2.3'

Trying Another Method

Using the github repository directly, I attempted to download the package. with the following code

pip install git+https://github.com/Kohulan/Smiles-TO-iUpac-Translator.git

STOUT-pip was successfully installed

Step 5 : Simple Usage. Saving the example to a python file and running it on the WSL command line, I encountered an error. OSError: [Errno 0] JVM DLL not found: Define/path/or/set/JAVA_HOME/variable/properly

Following this error, I installed a default version of Java using the code

sudo apt update
sudo apt install openjdk-11-jre

After that, I set the JAVA_HOME variable to ensure it is set properly using the below code export JAVA_HOME=/usr/bin

Then I ran the python file to get the desired output.

DhanshreeA commented 10 months ago

Good job! @Kadeniyi23 I have a small question, why did you need to install conda again? Did you not have conda on your system from having installed ersilia before?

Kadeniyi23 commented 10 months ago

Thank you fo the feedback, @DhanshreeA . I transitioned my approach , shifting from using the Windows Subsystem for Linux (WSL) command-line interface (CLI) to Visual Studio Code. After the integration of the Visual Studio Code interface with WSL, I went ahead to reinstall Miniconda to ensure its it worked properly

Kadeniyi23 commented 10 months ago

Third Task: Run predictions for the EML

To run predictions for the EML, I first attempted to run the following code through the STOUT model, it then proceeded to yield a JAVA IMPLEMENTATION ERROR

import csv
from STOUT import translate_forward

# Define a function to translate SMILES to IUPAC name
def smiles_to_iupac(smiles):
    try:
        iupac_name = translate_forward(smiles)
        return iupac_name
    except Exception as e:
        return str(e)  # Return an error message if translation fails

# Path to the input CSV file
input_csv_path = '/root/miniconda3/envs/eos4f95/bin/eml_canonical.csv'

# Path to the output CSV file
output_csv_path = '/root/miniconda3/envs/eos4f95/bin/translated_results.csv'

# Open the input CSV file for reading and the output CSV file for writing
with open(input_csv_path, 'r') as input_csvfile, open(output_csv_path, 'w', newline='') as output_csvfile:
    csvreader = csv.reader(input_csvfile)

    # Skip the header row if it exists
    header = next(csvreader, None)

    # Create a CSV writer for the output file
    csvwriter = csv.writer(output_csvfile)

    # Write the header to the output CSV file
    if header:
        csvwriter.writerow(header + ["IUPAC Name"])  # Add a new column header

    # Iterate through each row of the input CSV file
    for row in csvreader:
        # Assuming the SMILES strings are in the second column (index 1)
        smiles = row[1]

        # Translate the SMILES to IUPAC name
        iupac_name = smiles_to_iupac(smiles)

        # Write the row to the output CSV file, including the new IUPAC name
        csvwriter.writerow(row + [iupac_name])

print(f"Results have been written to {output_csv_path}.")

The error is detailed here hs_err_pid16667.log

Various attempts to debug were made, including searching for the error on Stack Overflow and soliciting help from the slack group page

leilayesufu commented 10 months ago

https://github.com/ersilia-os/ersilia/issues/823#issuecomment-1751671814

Try importing translate_reverse too

Kadeniyi23 commented 10 months ago

Thank you. The python file you shared also gave the same error. I did it in a couple of ways,

using the python file you shared here for prediction
converting all the relevant SMILES to a list to be translated
using the code I ran above

All shared the same error 😞

leilayesufu commented 10 months ago

https://github.com/ersilia-os/ersilia/issues/823#issuecomment-1751694319

But you could run predictions earlier with it, when testing? I think it might have been an issue with the jdk you installed

Kadeniyi23 commented 10 months ago

When I looked at it, I saw that it involved me downloading an earlier version of Java (13.0) as JPype was only tested with versions 1-13.0. Installing an earlier version of Java in which the version I used was 17.1 was not recommended for production on the JAVA website.I figured this was because with every update comes a lot of bug -fixing.

Kadeniyi23 commented 10 months ago

Week 2

First Task: Selecting a model from the suggested list.

After many tries to debug the STOUT (SMILES to IUPAC) I picked , I have made the decision to switch to the NCATS Rat Liver Microsomal Stability. Reading the documentation the NCATS- ADME contains several model that would be industrious to pharmacy and pharmocology as a whole. The different models created have different capabilities , an example is the the RLM Stability model, which helps in predicting the stability of compound. This would researchers to be able to the potential stability and lifespan of a compund in the body. Another example is the PAMPA ph 7.4 model which gauges the permeability of drugs across cellular membranes. With this, researchers are able to predict the likelihood of a drug being easily absorbed in the body.

But the main reason I chose this model, is it encomprises more than one AI/ML model which enables to have a front seat look to different Machine learning models implemented. In the PAMPA ph 7.4 model, Chemprop a model built by MIT is used.

Kadeniyi23 commented 10 months ago

Second Task: Installing the model in system.

I followed this steps to install the NCATS Rat Liver Microsomal Stability model in my system.

Opening the Ubuntu CLI on my WIndows systems, I cloned the repository using the recursive flag git clone --recursive https://github.com/ncats/ncats-adme.git
Redirecting my base to where the Ncats-adme server is using the following code cd /home/kabirat/ncats-adme
Next, I entered the following code to set up my environment conda env create --prefix ./env -f environment.yml
After the environment is created, I ran the code python app.py
After several hours the models were successfully installed

Kadeniyi23 commented 10 months ago

Third Task : Run predictions for the EML

Using the Essential Medicines List gotten from here, I downloaded the file.

I used the following code to extract the second column of the csv to take only the SMILES notation to be used for prediction with the NCATS-ADME model
```
import csv
```

Input CSV file name

input_file = 'eml_canonical.csv'

Output CSV file name

output_file = 'SMILES.csv'

Function to extract the second column from the input CSV and save it to the output CSV

def extract_second_column(input_file, output_file): try: with open(input_file, 'r', newline='') as infile, open(output_file, 'w', newline='') as outfile:

Create CSV reader and writer objects

        reader = csv.reader(infile)
        writer = csv.writer(outfile)

        for row in reader:
            if len(row) >= 2:  # Check if the row has at least two columns
                second_column = row[1]  # Index 1 is the second column (0-based index)
                writer.writerow([second_column])

    print(f"Second column extracted from '{input_file}' and saved to '{output_file}'.")

except FileNotFoundError:
    print(f"File '{input_file}' not found.")
except Exception as e:
    print(f"An error occurred: {e}")

Call the function to extract the second column

extract_second_column(input_file, output_file)

Kadeniyi23 commented 10 months ago

Task 4 :Run predictions for the EML

On running the app on my system, I open the app on chrome hereand run the csv with the SMILES notation on the app. I got the following results: RLM(Rat Liver Microsomal Stability)-ADME_Predictions_2023-10-11-132525.csv Pion’s patented µSOL assay (Solubility)- ADME_Predictions_2023-10-11-132606.csv Parallel artificial membrane permeability assay (PAMPA)(Assay pH=7.4)- ADME_Predictions_2023-10-11-132710.csv

Parallel artificial membrane permeability assay (PAMPA)(Assay pH=5.0)ADME_Predictions_2023-10-11-132657.csv

Human Liver Cytosolic Stability- ADME_Predictions_2023-10-11-132911.csv

DhanshreeA commented 10 months ago

Hi @Kadeniyi23 It is unfortunate that JRE kept giving you issues while trying to run STOUT, and it is good that you could get NCATS to run on your system. As a bonus task, could you try and get the NCATS model to run not as a server but as a simple python script? Let me know if you need any clarifications.

Kadeniyi23 commented 10 months ago

Hi @DhanshreeA . Thank you for your feedback. Further clarification is needed. Do you mean running the app.py python script independently in another environment created in order to be able to run the model

Kadeniyi23 commented 10 months ago

Task 4: Compare results with the Ersilia Model Hub implementation!

To compare the results gotten from Parallel artificial membrane permeability assay (PAMPA)(Assay pH=7.4) in the csv file here to the model implemented in the Ersilia model Hub:

I search for Parallel artificial membrane permeability assay and pick the model Parallel Artificial Membrane Permeability Assay (PAMPA) 7 .
I navigate to the github repository here.
I proceed to download the model eos9tyg
and fetch the model using ersilia -v fetch eos9tyg successfully 😄
Running the model eos9tyg I was able to generate the output and save it into a csv file SMILES_ersilia.csv

Comparing the two models.

Parallel Artificial Membrane Permeability is an in vitro surrogate to determine the permeability of drugs across cellular membranes. In an attempt to understand the model used, the Parallel artificial membrane permeability assay is used to measure how easily substances that pass through synthetic substances that mimic the lining of the human gastro-intestinal tract. In the original model provided by NCATS-ADME, it seeks to predict if a compound has very low or high permeability.If the predicted class is '1', it means the compound is predicted to have 'low or moderate permeability' (i.e., log Peff < 2.0) and if the predicted class is '0', the compound is predicted to have 'high permeability' (i.e., log Peff > 2.5). In the intepretation of the eos9tyg model given here, the output type is given in float denoting the probability of the compound being poorly permeable. The higher the number, the more likely it is poorly permeable

Taking the first ten values and seeking to compare the two predictions	Compound	Ersilia Model Eos9tyg prediction	Permeability	NCATS model	Permeability
Nc1nc(NC2CC2)c3ncn([C@@H]4CC@HC=C4)c3n1	1	poor permeability	1 (0.9)	low permeability
C[C@]12CCC@HCC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5	0.156	medium to high permeability	0 (1.0)	moderate or high permeability
CC(=O)Nc1sc(nn1)S(=O)	1	poor permeability	1 (0.97)	low permeability	1 (1.0)	low permeability
CC(O)=O	1	poor permeability	1 (0.99)	low permeability
CC(=O)NC@@HC(O)=O	1	poor permeability	0 (0.96)	moderate or high permeability
CC(=O)Oc1ccccc1C(O)=O	1	poor permeability	1 (0.99)	low permeability
NC1=NC(=O)c2ncn(COCCO)c2N1	1	poor permeability	1 (0.99)	low permeability
OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1	0.034	medium to high permeability	0 (0.99)	moderate or high permeability
CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1	0.248	medium to high permeability	0 (0.99)	moderate or high permeability
CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1	0.268	medium to high permeability	0 (0.99)	moderate or high permeability

A hundred percent accuracy was seen in the 10 compounds predicted. 💯

Kadeniyi23 commented 10 months ago

Task 5: Install and run Docker!

I was successfully able to install and run Docker Hub. I was also able to successfully run the model eos3b5e from Docker desktop 👍

Kadeniyi23 commented 10 months ago

Week 3: Suggest a new model and document it (1)

CalcAMP Model

Model Name

CalcAMP

Description

In this model, the authors seek to predict the activity of antimicrobial peptides. Antimicrobial peptides(AMPs) can be quite effective in fighting the multi-drug resistance pandemic worldwide. Finding effective and potent AMPs is an ardouos process and the development of a machine learning process that can accurately predict whether a peptide possesses these antimicrobial properties would be useful and is a time-saving process. The machine learning model predicts the antimicrobial activity of peptides by analyzing various features, including general physicochemical properties and sequence composition.

Publication details

Title: CalcAMP: A New Machine Learning Model for the Accurate Prediction of Antimicrobial Activity of Peptides
Authors: Bournez C, Riool M, de Boer L, Cordfunke RA, de Best L, van Leeuwen R, Drijfhout JW, Zaat SAJ, van Westen GJP
URL: CalcAMP: A New Machine Learning Model for the Accurate Prediction of Antimicrobial Activity of Peptides

Model Overview

The dataset of peptides was collated from a publicly available data from five different databases. The comparison of different ML algorithms to develop a classification model between AMP and non-AMP were made using the package PyCaret 2.3.6. Additionally, a Multi-layer Perceptron model created with Scikit-Learn 0.23.2 was used for the comparison. The final models that were ultimately created for the retained algorithms include LightGBM, XGBoost, CatBoost, Random Forest (RF), and Extra Trees (ET) classifiers.

Relevance to Ersilia

With Ersilia's goal of democratising access to AI/ML models relating to biomedical research, the CalcAMP model which predicts the antimicrobial activity of peptides is an added boon when it comes research of multi-drug resistance. It enables us to assess the different qualities of different AMPs, as well as detect which ones would active against a plethora of Gram positive and Gram Negative bacteria.

Model Implementation

The link to the model CalcAMP Although the model has not been published and released yet, an example is denoted in Simple prediction.ipynb where a sample prediction is shown. The different models are also saved in the models folder.The dataset used is linked here.

Kadeniyi23 commented 10 months ago

Week 3: Suggest a new model and document it (2)

AquaPred Model

Model Name

AquaPred

Description

This model seeks to accurately predict molecular solubility of compounds using Attention-Based Graph Neural Network. In drug discovery. This machine learning model plays a significant role in predicting aqueous solubility of compounds in drug discovery. During drug discovery, Active pharmaceutical ingredients are a key ingredient for high drug efficacy. The authors, with this model aim to predict the aqueous solubility of compounds which is a key physicochemical attribute required for API characterization.

Publication Details

Title: Attention-Based Graph Neural Network for Molecular Solubility Prediction
Authors: Waqar Ahmad, Hilal Tayara, and Kil To Chong
URL: Attention-Based Graph Neural Network for Molecular Solubility Prediction

Model Overview

The model uses the dataset contained here as compiled by an alternative research referenced here. The data was fitted to four different graph neural networks namely SGConv, GIN, GAT, and AttentiveFP to identify the most effective model for predicting solubility. The study shows that Attentive FP was the best model which uses SMILES as the input for molecular representation and and captures both intermolecular and intramolecular properties through information propagation and gated recurrent units (GRU).

Relevance to Ersilia

In-silico prediction of water solubility could alternatively lead to higher efficacy for drugs while speeding up drug development timeline. One of Ersilia's goals is to support research in many Low and Middle Income countries. With machine learning models like this, we get to bypass weeks or maybe months of research, tapping into the power of Artificial Intelligence to accelerate the drug discovery process.

Model Implementation

The code to the model can be found here. No recent releases have been published, but the code look ready to go. Assessing the AttentiveFP model here used in the models folder, further testing could be done to scan for bugs.

Kadeniyi23 commented 10 months ago

Week 3: Suggest a new model and document it (3)

PrankWeb 3

Model Name

P2Rank

Description

This model seeks to predict the Ligand binding sites(LBS) of proteins. Identification of theses sites and the interactions that ensues would be needed for elucidation of the molecular mechanisms of enzymes, regulation of protein oligomerization, or designing new drugs in cases where drug resistance has occurred which can be a time consuming process when performed experimentally. With this model, the protein's ligand binding site is predicted with the protein's 3-dimensional structure. The model not only comprises of the CL app(P2Rank), but also a webapp PrankWeb3. PrankWeb accepts a protein structure on its input, computes evolutionary conservation, and predicts binding sites which are then mapped onto the structure and can be viewed.

Publication Details

Title: PrankWeb 3: accelerated ligand-binding site predictions for experimental and modelled protein structures
Authors: David Jakubec, Petr Skoda, Radoslav Krivak, Marian Novotny, David Hoksza
URL: https://academic.oup.com/nar/article/50/W1/W593/6591527

Model Overview

The model has two implementations: The CLI app- P2Rank and the web app -PrankWeb3. P2Rank not only used machine learning based knowledge but also a combination of geometric, energetic and evolution based knowledge which is a combination seen with the experimental method used for ligand-binding site prediction of proteins. It then applies different characteristics (the protein's structure, physico-chemical properties, and evolutionary information) to a mesh and then construct a machone ;earning model using this representation. The ML model is then used to identify points on the protein's surface that can potentially bind to ligands and proceed to group the identified points together list of surface patches that correspond to the predicted Ligand Binding Sites (LBSs).

Relevance to Ersilia

One of the core reasons of implementing this model is designing of new drugs in cases when there is a sudden case of drug resistance. In cases of Low and Middle Income countries, where drug-resistant strains may arise, the rapid implementation o drug designing and production may save millions of lives.

Model Implementation

Following the Installation steps, the requirements to install P2Rank is Java and PyMOl which is used to view visualization. It is recommended to view it bash as the model is a command-line program. The model looks implementable with the link to the code found here. No installation is required as the package is downloaded as github releases. The latest version (version 2.4.1) can be downloaded as a compressed file. With various commands, the input is entered as pdb file and predicted values will be generated as follows:

a file with "_predictions.csv" in the name has a list of predicted pockets, their scores, and their coordinates. It also includes a list of nearby residues and protein surface atoms.
a file "_residues.csv" with a list of all the residues the protein, their scores, and which pocket they belong to.
a visualization folder with a ,pml file, which can be viewed with PyMol Software

The web app is available here. This system can be implemented in three modes

P2Rank using Docker.
Prankweb using Docker
P2Rank without Docker with limited functionality

Kadeniyi23 commented 10 months ago

Side Task: Running the NCATS model as a single python script

Hi @Kadeniyi23 It is unfortunate that JRE kept giving you issues while trying to run STOUT, and it is good that you could get NCATS to run on your system. As a bonus task, could you try and get the NCATS model to run not as a server but as a simple python script? Let me know if you need any clarifications.

DhanshreeA commented 10 months ago

Hi @Kadeniyi23 many thanks for the updates and sincerest apologies for responding late. Please look at this comment for further clarification. https://github.com/ersilia-os/ersilia/issues/849#issuecomment-1768229150 Also it is a bonus task, please don't feel pressured.

GemmaTuron commented 10 months ago

Hello,

Thanks for your work during the Outreachy contribution period, we hope you enjoyed it! We will now close this issue while we work on the selection of interns. Thanks again!

ersilia-os / ersilia

Contribution Period: Kabirat Adeniyi #823

Week 1 - Get to know the community

Week 2 - Install and run an ML model

Week 3 - Propose new models

Week 4 - Prepare your final application

Week 1

1st Task: Joining the communication channels

2nd Task:Creating an issue

3rd Task: Install the Ersilia Model Hub and test the simplest model

Installing Ersilia

To test a model with Ersilia

Motivation Statement: Why I Joined Outreachy and wish to work at Ersilia 🚀

Fifth Task: Submit your first contribution to the Outreachy site

Week 2

First Task: Selecting a model from the suggested list.

Second Task: Installing the model in system.

Installation Error ❌

Trying Another Method

Third Task: Run predictions for the EML

Week 2

First Task: Selecting a model from the suggested list.

Second Task: Installing the model in system.

Third Task : Run predictions for the EML

Input CSV file name

Output CSV file name

Function to extract the second column from the input CSV and save it to the output CSV

Create CSV reader and writer objects

Call the function to extract the second column

Task 4 :Run predictions for the EML

Task 4: Compare results with the Ersilia Model Hub implementation!

Comparing the two models.

Task 5: Install and run Docker!

Week 3: Suggest a new model and document it (1)

CalcAMP Model

Model Name

Description

Publication details

Model Overview

Relevance to Ersilia

Model Implementation

Week 3: Suggest a new model and document it (2)

AquaPred Model

Model Name

Description

Publication Details

Model Overview

Relevance to Ersilia

Model Implementation

Week 3: Suggest a new model and document it (3)

PrankWeb 3

Model Name

Description

Publication Details

Model Overview

Relevance to Ersilia

Model Implementation

Side Task: Running the NCATS model as a single python script