Ajoke23 commented 9 months ago

Week 1 - Get to know the community

[X] Join the communication channels
[X] Open a GitHub issue (this one!)
[x] Install the Ersilia Model Hub and test the simplest model
[x] Write a motivation statement to work at Ersilia
[x] Submit your first contribution to the Outreachy site

Week 2 - Install and run an ML model

[x] Select a model from the suggested list
[x] Install the model in your system
[x] Run predictions for the EML
[x] Compare results with the Ersilia Model Hub implementation!
[x] Install and run Docker!

Week 3 - Propose new models

[x] Suggest a new model and document it (1)
[x] Suggest a new model and document it (2)
[x] Suggest a new model and document it (3)

Week 4 - Prepare your final application

[x] Submit the final application in the Outreachy website

Ajoke23 commented 9 months ago

WEEK 1 DAY 1 (3rd October 2023)

I joined the Slack communication channel on 3rd October 2023 to express my interest in contributing to the success of Ersilia's project
I read the code and conduct
I followed Erisilia's page on GitHub, starred, and forked the repository because I like the work Ersilia is doing.
I also went to check various issues to see what the community has been working on.
I further went ahead to Ersilia's website to have a full concept of their goals which aligned with my interest as an SDG 3 advocate. This ignited my zeal to contribute while learning, re-learning, and unlearning.
I introduced myself in the #general channel on Slack which can be found here
I understand that being new to an open-source project can be overwhelming, so I made sure to welcome the new members to make them comfortable and also to understand that Ersilia's community is such a collaborative one. I welcomed them and also tagged the message posted by @GemmaTuron that contains all the important guidelines needed on how to make the most experienced during the Outreachy contribution period.

DAY 2 (4th October, 2023)

I started by reading Ersilia's documentation on the installation process of the Ersilia Model Hub
While reading the documentation, I noticed a bug and instantly I opened an issue which can be found here
I have been getting amazing feedback from new members and how they are welcomed to such an amazing community. This is a link that shows the comment from a new member after the welcome message I sent.
I started by Installing the Ubuntu terminal Windows 10 Pro using the command below on Powershell
```
wsl --install
```
To install ersilia on Ubuntu, certain requirement has to be met which involves the installation of the following:
1. I proceeded to install the gcc compiler using:
```
sudo apt install build-essential
```
2. Afterwards, I installed Miniconda in Ubuntu.
3. Furthermore, I went ahead to install GitHub CLI using:
```
conda install gh -c conda-forge
```
  and I used GitHub CLI to log in using the command below:
```
gh auth login
```
4. I installed Git LFS(Large File Storage) because some pre-trained models containing numerous parameters require large storage. To install Git LFS using conda, use:
```
conda install git-lfs -c conda-forge
```
5. After installing Git LFS, I went ahead to activate it using:
```
git-lfs install
```
6. I installed lsaura data lake for caching of model prediction using the command below:
```
conda activate ersilia
python -m pip install isaura==0.1
```
7. To be sure I've successfully installed Ersilia on Ubuntu, I ran the following commands ersilia --help and ersilia catalog which gave the following output output2.txt & output3.txt. Thus, this output shows that Ersilia has successfully been installed on Ubuntu.
When I got the process of fetching model eos3b5e using this code ersilia -v fetch eos3b5e, I encountered an error which says: connection aborted, TimeoutError(110, 'Connection timed out). I tried debugging the error by checking online, stack overflow, and previous issues raised in the Ersilia repository but none solved my problem.

DAY 3 (5th October 2023)

The issue I opened regarding the bug was successfully fixed and closed as completed by one of the mentors @miquelduranfrigola as seen
Since I was unable to resolve the problem I had yesterday(4th October 2023), I proceeded to ask for help on #ersilia-install channel on slack. This link to my message sent can be found here
Various suggestions were given by some members of the community but none solved my problem

DAY 4 (6th October 2023)

I continue sourcing for information on how to fetch the model successfully. A suggestion came in from a member which I tried and I was able to successfully fetch the model eos3b5e
The issue encountered is that the network provider blocked githubraw, so I installed proton VPN on my system and voila, I was able to successfully fetch and serve the moment.
After following the process from the Ersilia documentation, I successfully installed the Ersilia& Ersilia python package and activated it.
Now that I'm sure that Ersilia is recognized in Ubuntu, I tested some models by fetching, serving model eos3b5e, and calculating the molecular weight as required in the task using the following code:
```
ersilia -v fetch eos3b5e
ersilia -v serve eos3b5e
ersilia -v api run -i "CCCC"
```
and the following output was fetch.txt, serve.txt & molecular_weight.txt
I attended the onboarding call by Ersilia teams regarding the Outreachy contribution period

DAY 5 (7th October 2023)

After completing the first three tasks in Week 1, I went ahead to the fourth task. I wrote my motivation statement stating a brief introduction about myself, my goals and objectives, the skills I have, what I will do during the internship, and also my post internship plans which is to make research sustainable in Nigeria, sub-Saharan Africa, and eventually, globally.
My motivation statement for wanting to be part of the internship phase can be found here

DAY 6 (8th October 2023)

I recorded my contribution of week 1 tasks to Outreachy website
Week 1 tasks has successfully been completed

DhanshreeA commented 9 months ago

Hi @Ajoke23 thank you for the updates. I see that some items from the week 1 tasks are still pending. Please tell us if you'd like any support in completing them.

Ajoke23 commented 9 months ago

Hi @Ajoke23 thank you for the updates. I see that some items from the week 1 tasks are still pending. Please tell us if you'd like any support in completing them.

Yes, I need support. I am finding it hard to fetch model eos3b5e. I am getting errors regarding connections and I have asked on the slack channels, people gave suggestions and I have tried all that but it isn't working yet. I went online and saw some related post on the issue on ersilia's repository but none of the suggestion has worked. I am still trying my best to figure it out. I will appreciate any help from you

leilayesufu commented 9 months ago

Hi, have you been able to fetch it?

Ajoke23 commented 9 months ago

MOTIVATION STATEMENT

I'm Ajoke Yusuf, a Data Scientist, Machine Learning enthusiast, and SDG 3 advocate. I'm a resourceful, goal-oriented individual who possesses strong analytical and problem-solving skills with an unending quest for knowledge. I pride myself on being a fast learner and honed strong skills in problem-solving and research. After receiving the Outreachy email, one of my aims before choosing a project is to check out the project whose aim, and mission align with my goal and career objective as an impact maker and an SDG 3 advocate.

I went through each of the projects and I came across Ersilia's project whose mission statement is:

"To equip laboratories in Low and Middle Income Countries with state of the art AI/ML tools for infectious and neglected disease research."

As an Engineering graduate living in Nigeria, I developed an interest in the biomedical field due to the increasing mortality rate of infectious diseases in Nigeria and sub-Saharan Africa. Based on statistics, according to UNICEF (United Nations International Children Emergency Funds):

infectious disease is the major cause of the mortality rate in children ≤ 5 years

This was cited from here.

Research from NIH (National Library of Medicine) & NCBI (National Centre for Biotechnology Information) confirms that:

"The infrastructure and level of support for surveillance, research, and training on emerging infectious diseases
in Africa are extremely limited".

Link here

As a Data Scientist, skilled at Python, Machine learning, I possess strong analytical and research skills. I believe that contributing to this project will help me garner knowledge and technical skills that will help in advancing and improving health research in Nigeria.

If accepted for the 3 months internship, I'll commit myself to bringing suggestions, undergoing research, and collaborating with the Ersilia team while learning and honing skills in Artificial Intelligence and Machine Learning. This period of internship will help propel my research and problem-solving skills which will be useful in the long run for the advancement of technology in the health sector, improving and making a sustainable impact in health research in Nigeria.

As a young lady living in Nigeria, an underdeveloped and low-income country, I have experienced challenges encountered in having accessibility tools for prevalent infectious diseases in my community and country (Nigeria) at large.

After the internship, I plan on utilizing the skills gained to improve and sustain health research tools, solve prevalent health disease issues in Nigeria, and reduce the mortality rate caused by infectious diseases. Thus, propelling sustainable research skills that will leave a long-lasting impact in the health sector in my community, Nigeria, sub-Saharan Africa, and eventually, globally. Also, I'll continue to contribute my quota to further success of Ersilia's project

Ajoke23 commented 9 months ago

WEEK 2

Task 1 - SELECT A MODEL FROM THE SUGGESTED LIST

Day 7 (9th October, 2023)

I selected the STOUT (SMILES to IUPAC) model and the reason for choosing this model for implementation includes the following: A. INTEREST IN THE APPLICATION: I can remember vividly when I was still in high school as a science student, I've always had difficulty in naming the nomenclature of chemical compounds so seeing a Machine Learning model that could do that, suddenly ignited my interest. In the health sector, IUPAC names are useful in communicating the structure and properties of potential drugs, aid in the development of drugs, and are useful in understanding the mechanism of action & metabolism of how drugs work in the body. As an SDG 3 advocate, this made me interested to further delving deeper into how building such a model is achieved because as a problem solver, I would love to incorporate the knowledge I gained in working with the model to help solve infectious diseases and sustain research problems in the health sector in Nigeria thus and eventually, would lead to the sustainability of scientific leadership with researchers B. ML ALGORITHMS USED: In the journal provided in the repository, it was stated that the model uses a deep learning method specifically NMT (Neural Machine Translation) which follows the implementation of Google NMT models for SMILES to IUPAC name translation. I want to understand the knowledge and thought process behind the implementation. C. GOAL TO BE ACHIEVED: As a machine learning enthusiast, data scientist, problem solver, and SDG 3 advocate, this will give me have deeper understanding and technical knowledge to execute tasks and solve problems easily. This knowledge regarding NMT will make it easier to collaborate and build a model that will serve as a tool for researchers interested in working and solving infectious disease problems in Nigeria and globally. Thus, reducing the mortality rate of infectious diseases in Nigeria (a low-income country).

Task 2 - INSTALL THE MODEL TO YOUR SYSTEM

Day 8 (10th October, 2023)

I followed the installation instructions on the STOUT model GitHub repository.

I started by using the first method of installation which involves using pypi and I encountered this error.log
In the process of trying to debug, I tried upgrading my pip by using: pip install --upgrade pip and this output7.log show that the upgrade was successful.
I further tried using pypi again and I got this error again output8.log -I decided to use the try the second method of installation using conda environment. I ran this code below and still got an error
```
conda create --name STOUT python=3.8 
conda activate STOUT
conda install -c decimer stout-pyp
```
I proceeded to using the third method of installation which involves installing straight from the repository using this code below: pip install git+https://github.com/Kohulan/Smiles-TO-iUpac-Translator.git and I got the same error.log I was getting using the first two method of installation
I proceeded to using stack overflow
Since I was unable to bug by myself, I proceeded to #ersilia-install channel on slack to ask. This link to the message can be found here

Day 9 (11th October,2023) - Day 15 (17th October, 2023)

I had issues with my system. I thought it was something minimal but I got to know that the drive crashed so it took me a while to repair it
During this period, I proceeded in making research on week 3 tasks by documenting them in goggle docs on my phone.
I also made sure I was interacting with the members on slack, gave suggestions and help some contributors who faced challenges during some of the tasks.
As soon as I got my laptop repaired, I proceeded in continuing where I stopped in week 2 tasks.

Day 16 (18th October, 2023)

I received a comment on my post on ersilia-install channel on slack regarding people who also faced similar errors and one of the member gave a suggestion that worked for him, I decided to try it.
The solution that worked for me is running pip install STOUT-pypi using Google Collab.
This installation_output.txt here shows that STOUT has been successfully installed
I proceeded in testing the STOUT model with the following code:
```
from STOUT import translate_forward, translate_reverse
```

SMILES to IUPAC name translation

SMILES = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C" IUPAC_name = translate_forward(SMILES) print("IUPAC name of "+SMILES+" is: "+IUPAC_name)

IUPAC name to SMILES translation

IUPAC_name = "1,3,7-trimethylpurine-2,6-dione" SMILES = translate_reverse(IUPAC_name) print("SMILES of "+IUPAC_name+" is: "+SMILES)

which gave the following [output.txt](https://github.com/ersilia-os/ersilia/files/13169158/output.txt)

### **TASK 3 - RUN PREDICTION FOR THE EML PROVIDED**
DAY 17 (19 October, 2023)
- I clicked on this [link](https://raw.githubusercontent.com/ersilia-os/ersilia/master/notebooks/eml_canonical.csv), right click and select 'save as' which automatically downloaded the file. The downloaded file named is: [eml_canonical.csv](https://github.com/ersilia-os/ersilia/files/13170404/eml_canonical.csv).
- To run the prediction for Essential Medicine list, I demonstrated my knowledge of python to be able to achieve it

from google.colab import files uploaded = files.upload()

importing necessary libraries

import pandas as pd import io from STOUT import translate_forward

reading the file into dataframe

df = pd.read_csv(io.BytesIO(uploaded['eml_canonical.csv'])) print(df)

selecting the first 40 rows

df = df.head(40)

function to translate smile to IUPAC

def smiles_to_iupac(smiles): iupac = translate_forward(smiles) return iupac

function to translate CAN-SMILES to IUPAC

def can_smiles_to_iupac(can_smiles): iupac = translate_forward(can_smiles) return iupac

Creating a new column for the iupac name by using the .apply function

df['smiles_iupac']=df['smiles'].apply(smiles_to_iupac) df['can_smiles_iupac'] = df['can_smiles'].apply(can_smiles_to_iupac)

Use .loc to assign values

df.loc[:, 'smiles_iupac'] = df['smiles'].apply(smiles_to_iupac) df.loc[:, 'can_smiles_iupac'] = df['can_smiles'].apply(can_smiles_to_iupac)

Creating smiles_iupac dataframe

smiles_iupac = df[['drugs', 'smiles', 'smiles_iupac']].copy() smiles_iupac

Creating can_smiles_iupac dataframe

can_smiles_iupac = df[['drugs', 'can_smiles', 'can_smiles_iupac']].copy() can_smiles_iupac

Saving these dataframes to separate CSV files to show the output

smiles_iupac.to_csv('smiles_iupac.csv', index=False) can_smiles_iupac.to_csv('can_smiles_iupac.csv', index=False)

Due to the running time of executing large volume of data on Google Collab, I decided to limit the prediction to the first 40 molecules of eml_canonical dataset
The output of the code is:
[smiles_iupac.csv](https://github.com/ersilia-os/ersilia/files/13170854/smiles_iupac.csv)
[can_smiles_iupac.csv](https://github.com/ersilia-os/ersilia/files/13170855/can_smiles_iupac.csv)

### **Task 4- Compare results with the Ersilia Model Hub implementation!**
Day 18 (20th October, 2023)

- I started the process by searching for the STOUT model identifier on [Ersilia Model Hub](https://www.ersilia.io/model-hub)
- On seeing the STOUT: SMILES to IUPAC name translator on  [Ersilia Model Hub](https://www.ersilia.io/model-hub), I clicked on the [GitHub](https://github.com/ersilia-os/eos4se9) button 
- The STOUT model has an  EOS model ID: `eos4se9` and the name of the Slug is: `smiles2iupac`
-  I used the following code to fetch, serve and run the model prediction

ersilia -v fetch eos4se9 ersilia -v serve eos4se9 ersilia -v api run -i smiles_iupac.csv -o smilesoutput.csv

I successfully fetched and [served](https://github.com/ersilia-os/ersilia/files/13184630/modelserve.log) the model but the output after running the model prediction I noticed the iupacs_names columns was empty which implied that i got no output i.e there was no iupac name translation for smile input. 
Output file: [smilesoutput.csv](https://github.com/ersilia-os/ersilia/files/13184672/smilesoutput.csv)
- Trying to debug where the problem came from, I decided to run the model with an input string using this command `ersilia -v api run -i "Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1"`. Output shown below:

{ "input": { "key": "MCGSCOLBFJQGHM-SCZZXKLOSA-N", "input": "Nc1nc(NC2CC2)c3ncn([C@@H]4CC@HC=C4)c3n1", "text": "Nc1nc(NC2CC2)c3ncn([C@@H]4CC@HC=C4)c3n1" }, "output": { "outcome": [ null ] }

I was expecting to get an iupac_name but I got a null value as outcome.

Day 19 (21th October, 2023)

- In the process of debugging, I  noticed that another contributor also faced the same challenges and I saw @HellenNamulinda [suggestion](https://github.com/ersilia-os/ersilia/issues/821#issuecomment-1759694897) regarding the issue so I tried the option of fetching the model from GitHub, by adding the `--from_github` flag in the command.
-  Command used :` ersilia -v fetch eos4se9 --from_github > eos4se9_fetch_github.log 2>&1` and the output can be found [eos4se9_fetch_github.log](https://github.com/ersilia-os/ersilia/files/13187659/eos4se9_fetch_github.log). This show I successfully fetched the model.
- I went ahead to run a model prediction again with an input string using the command I previously used `ersilia -v api run -i 'Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1`' and below the output . Log file of the output can be found [input_outcome.log](https://github.com/ersilia-os/ersilia/files/13187950/input_outcome.log)

{ "input": { "key": "MCGSCOLBFJQGHM-SCZZXKLOSA-N", "input": "Nc1nc(NC2CC2)c3ncn([C@@H]4CC@HC=C4)c3n1", "text": "Nc1nc(NC2CC2)c3ncn([C@@H]4CC@HC=C4)c3n1" }, "output": { "outcome": [ "[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol" ] } }

- I decided to try another input command using this code: `ersilia -v api run -i  "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"`
Output:

{ "input": { "key": "NQQBNZBOOHHVQP-UHFFFAOYSA-N", "input": "C1=C(SC(=N1)SC2=NN=C(S2)N)N+[O-]", "text": "C1=C(SC(=N1)SC2=NN=C(S2)N)N+[O-]" }, "output": { "outcome": [ "5-[(5-nitro-1,3-thiazol-2-yl)sulfanyl]-1,3,4-thiadiazol-2-amine" ] } }

**Summary** 
Using this command below for smiles Ersilia STOUT prediction using the first 40 molecules in the EML dataset

ersilia -v fetch eos4se9 --from_github > eos4se9_fetch_github.log 2>&1 ersilia -v serve eos4se9 > eos4se9_serve_model.log 2>&1 ersilia -v api run -i smiles_iupac.csv -o smiles_output.csv

[ersilia_smiles_output](https://github.com/ersilia-os/ersilia/files/13189450/smiles_output.csv)-  Ersilia STOUT prediction of smiles to iupac

### **COMPARISON OF SMILES OUTPUT PREDICTION USING ERSILIA AND STOUT PREDICTION**

ERSILIA prediction output: [smiles_output.csv](https://github.com/ersilia-os/ersilia/files/13189683/smiles_output.csv)
STOUT prediction output: [smiles_iupac.csv](https://github.com/ersilia-os/ersilia/files/13189687/smiles_iupac.csv)

Since I've two different csv file that contains ersilia prediction output and STOUT prediction output. I decided to merge the two datasets together using my knowledge of Python and selecting the necessary columns to be shown as ouput.
using the Python code below

importing necessary libraries

import pandas as pd

Loading the two datasets

smiles_iupac is the STOUT prediction while smiles_output is Ersilia prediction

smiles_iupac = pd.read_csv(r"\wsl.localhost\Ubuntu\home\ajoke\smiles_iupac.csv") smiles_output = pd.read_csv(r"\wsl.localhost\Ubuntu\home\ajoke\smiles_output.csv")

Merging the datasets on the 'smiles' and 'input' columns

merged_dataset = smiles_iupac.merge(smiles_output, left_on='smiles', right_on='input')

Renaming the columns as specified

merged_dataset.rename(columns={'input': 'input/smiles', 'iupacs_names': 'Ersilia STOUT prediction', 'smiles_iupac': 'STOUT prediction'}, inplace=True)

Keeping only the desired columns

merged_dataset = merged_dataset[[ 'Ersilia STOUT prediction', 'STOUT prediction', 'input/smiles', 'drugs']] merged_dataset

Saving the dataset to a new CSV file

merged_dataset.to_csv('comparison_dataset.csv', index=False)

[merged_ouput](https://github.com/ersilia-os/ersilia/files/13196354/comparison_dataset.csv) - Output of the merged dataset in csv format. Below is the table format
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/HP/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">

<!--table
    {mso-displayed-decimal-separator:"\.";
    mso-displayed-thousand-separator:"\,";}
@page
    {margin:.75in .7in .75in .7in;
    mso-header-margin:.3in;
    mso-footer-margin:.3in;}
tr
    {mso-height-source:auto;}
col
    {mso-width-source:auto;}
br
    {mso-data-placement:same-cell;}
td
    {padding-top:1px;
    padding-right:1px;
    padding-left:1px;
    mso-ignore:padding;
    color:black;
    font-size:11.0pt;
    font-weight:400;
    font-style:normal;
    text-decoration:none;
    font-family:Calibri, sans-serif;
    mso-font-charset:0;
    mso-number-format:General;
    text-align:general;
    vertical-align:bottom;
    border:none;
    mso-background-source:auto;
    mso-pattern:auto;
    mso-protection:locked visible;
    white-space:nowrap;
    mso-rotate:0;}
-->
</head>
<body link="#0563C1" vlink="#954F72">

Ersilia STOUT   prediction | STOUT prediction | input/smiles | drugs
-- | -- | -- | --
[(1R,4R)-4-[2-amino-4-(cyclopropylamino)-4H-purin-9-yl]cyclopent-2-en-1-yl]methanol | [(1S,4R)-4-[2-amino-6-(cyclopropylamino)purin-9-yl]cyclopent-2-en-1-yl]methanol | Nc1nc(NC2CC2)c3ncn([C@@H]4C[C@H](CO)C=C4)c3n1 | abacavir
(1S,2S,5S,10R,11R,14S)-5,11-dimethyl-5-pyridin-3-yltetracyclo[9.4.0.02,6.010,14]pentadeca-7,16-dien-14-ol | (3S,8R,9S,10R,13S,14S)-10,13-dimethyl-17-pyridin-3-yl-2,3,4,7,8,9,11,12,14,15-decahydro-1H-cyclopenta[a]phenanthren-3-ol | C[C@]12CC[C@H](O)CC1=CC[C@@H]3[C@@H]2CC[C@@]4(C)[C@H]3CC=C4c5cccnc5 | abiraterone
N-[5-[amino(dioxo)-Î»6-thia-3,4-diazacyclopent-2-en-2-yl]acetamide | N-(5-sulfamoyl-1,3,4-thiadiazol-2-yl)acetamide | CC(=O)Nc1sc(nn1)[S](N)(=O)=O | acetazolamide
aceticacid | aceticacid | CC(O)=O | acetic acid
(2R)-2-acetamido-3-sulfanylpropanoicacid | (2R)-2-acetamido-3-sulfanylpropanoicacid | CC(=O)N[C@@H](CS)C(O)=O | acetylcysteine
2-acetyloxybenzoicacid | 2-acetyloxybenzoicacid | CC(=O)Oc1ccccc1C(O)=O | acetylsalicylic acid
2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one | 2-amino-9-(2-hydroxyethoxymethyl)-3H-purin-6-one | NC1=NC(=O)c2ncn(COCCO)c2N1 | aciclovir
2-[(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]oxy-1,1-dithiophen-2-ylethanol | [(3R)-1-(3-phenoxypropyl)-1-azoniabicyclo[2.2.2]octan-3-yl]2-hydroxy-2,2-dithiophen-2-ylacetate | OC(C(=O)O[C@H]1C[N+]2(CCCOC3=CC=CC=C3)CCC1CC2)(C1=CC=CS1)C1=CC=CS1 | aclidinium
(E)-N-[6-[[(3-chloro-4-fluorocyclohexa-1,4-dien-1-yl)amino]methylidene]-3-[(3S)-oxolan-3-yl]oxycyclopenta[d]pyrimidin-2-yl]-4-(dimethylamino)but-2-enamide | (E)-N-[4-(3-chloro-4-fluoroanilino)-7-[(3S)-oxolan-3-yl]oxyquinazolin-6-yl]-4-(dimethylamino)but-2-enamide | CN(C)C\C=C\C(=O)NC1=C(O[C@H]2CCOC2)C=C2N=CN=C(NC3=CC(Cl)=C(F)C=C3)C2=C1 | afatinib
methylN-(6-propylsulfanyl-1H-benzimidazol-2-yl)carbamate | methylN-(6-propylsulfanyl-1H-benzimidazol-2-yl)carbamate | CCCSc1ccc2nc(NC(=O)OC)[nH]c2c1 | albendazole
1,2-dihydropyrazolo[3,4-d]pyrimidin-4-one | 1,2-dihydropyrazolo[3,4-d]pyrimidin-4-one | O=C1N=CN=C2NNC=C12 | allopurinol
5-acetamido-2,4,6-triiodo-3-(1-oxoethylamino)cyclohexa-4,6-diene-1-carboxylicacid | 3,5-diacetamido-2,4,6-triiodobenzoicacid | CC(=O)Nc1c(I)c(NC(C)=O)c(I)c(C(O)=O)c1I | amidotrizoate
(2S)-N-[(1R,2R,3R,5S,6R)-5-amino-2-[(2R,3R,4R,5R,6R)-3-amino-4,5,6-trihydroxyoxan-2-yl]oxy-3-[(2R,3S,4R,5R)-5-amino-1,3,4,6-tetrahydroxyhexan-2-yl]oxy-1-hydroxyoxetan-6-yl]-2-hydroxy-4-(methylamino)butanamide | (2S)-4-amino-N-[(1R,2S,3S,4R,5S)-5-amino-2-[(2S,3R,4S,5S,6R)-4-amino-3,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-4-[(2R,3R,4S,5S,6R)-6-(aminomethyl)-3,4,5-trihydroxyoxan-2-yl]oxy-3-hydroxycyclohexyl]-2-hydroxybutanamide | NCC[C@H](O)C(=O)N[C@@H]1C[C@H](N)[C@@H](O[C@H]2O[C@H](CN)[C@@H](O)[C@H](O)[C@H]2O)[C@H](O)[C@H]1O[C@H]3O[C@H](CO)[C@@H](O)[C@H](N)[C@H]3O | amikacin
3,5-diamino-2-chloro-N-(diaminomethylidene)-2H-pyrazine-6-carboxamide | 3,5-diamino-6-chloro-N-(diaminomethylidene)pyrazine-2-carboxamide | NC(N)=NC(=O)c1nc(Cl)c(N)nc1N | amiloride
2-butyl-3-[4-[2-(diethylamino)ethoxy]-3,5-diiodocyclohexa-1,4-dien-1-yl]chromen-4-one | (2-butyl-1-benzofuran-3-yl)-[4-[2-(diethylamino)ethoxy]-3,5-diiodophenyl]methanone | CCCCc1oc2ccccc2c1C(=O)c3cc(I)c(OCCN(CC)CC)c(I)c3 | amiodarone
N,N-dimethyl-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylidene)propan-1-amine | N,N-dimethyl-3-(2-tricyclo[9.4.0.03,8]pentadeca-1(15),3,5,7,11,13-hexaenylidene)propan-1-amine | CN(C)CCC=C1c2ccccc2CCc3ccccc13 | amitriptyline
ethyl2-(2-aminoethoxymethyl)-4-[[3-(2-chlorophenyl)-4-methoxy-4-oxobut-2-en-2-yl]amino]cyclopenta-1,3-diene-1-carboxylate | 3-O-ethyl5-O-methyl2-(2-aminoethoxymethyl)-4-(2-chlorophenyl)-6-methyl-1,4-dihydropyridine-3,5-dicarboxylate | CCOC(=O)C1=C(COCCN)NC(=C(C1c2ccccc2Cl)C(=O)OC)C | amlodipine
12-chloro-7-(diethylaminomethyl)-2,9-diazatricyclo[8.4.0.03,8]tetradeca-1(14),4,6,9,10,13-hexaen-6-ol | 4-[(7-chloroquinolin-4-yl)amino]-2-(diethylaminomethyl)phenol | CCN(CC)Cc1cc(Nc2ccnc3cc(Cl)ccc23)ccc1O | amodiaquine
(2S,5R,6R)-5-[[(2R)-2-amino-2-(4-hydroxycyclohexa-1,3,5-trien-1-yl)acetyl]amino]-3,3-dimethyl-8-oxo-4-thia-1,7-diazabicyclo[4.3.0]nonane-2-carboxylicacid;tetrahydrate | (2S,5R,6R)-6-[[(2R)-2-amino-2-(4-hydroxyphenyl)acetyl]amino]-3,3-dimethyl-7-oxo-4-thia-1-azabicyclo[3.2.0]heptane-2-carboxylicacid;trihydrate | O.O.O.CC1(C)S[C@@H]2[C@H](NC(=O)[C@H](N)c3ccc(O)cc3)C(=O)N2[C@H]1C(O)=O | amoxicillin
(1S,3S,5S,7S,9R,10R,13R,18S,19R,20R,21S,22Z,24Z,26Z,28Z,30Z,32Z,34Z,36Z,38Z,40S,41R)-1-[(2S,3S,4R,5S,6R)-4-amino-3,5-dihydroxy-6-[(2R,3S,4R,5S,6R)-5-amino-3,4-dihydroxyoxan-2-yl]oxan-2-yl]oxy-3,5,7,9,10,13,18,41-octahydroxy-19,20,21-trimethyl-15-oxo-4,16,42-trioxatricyclo[37.2.1.03,5]dotetraconta-22,24,26,28,30,32,34,36,38-nonaene-40-carboxylicacid | (1R,3S,5R,6R,9R,11R,15S,16R,17R,18S,19Z,21Z,23Z,25Z,27Z,29Z,31Z,33R,35S,36R,37S)-33-[(2R,3S,4S,5S,6R)-4-amino-3,5-dihydroxy-6-methyloxan-2-yl]oxy-1,3,5,6,9,11,17,37-octahydroxy-15,16,18-trimethyl-13-oxo-14,39-dioxabicyclo[33.3.1]nonatriaconta-19,21,23,25,27,29,31-heptaene-36-carboxylicacid | C[C@H]1O[C@@H](O[C@@H]\2C[C@@H]3O[C@](O)(C[C@@H](O)C[C@@H](O)[C@H](O)CC[C@@H](O)C[C@@H](O)CC(=O)O[C@@H](C)[C@H](C)[C@H](O)[C@@H](C)\C=C/C=C\C=C/C=C\C=C/C=C\C=C2)C[C@H](O)[C@H]3C(O)=O)[C@@H](O)[C@@H](N)[C@@H]1O | amphotericin B
(2S,5R,6R)-7-[[(2R)-2-amino-2-phenylacetyl]amino]-3,3-dimethyl-8-oxo-4-thia-1,7-diazabicyclo[3.3.0]octane-2-carboxylicacid | (2S,5R,6R)-6-[[(2R)-2-amino-2-phenylacetyl]amino]-3,3-dimethyl-7-oxo-4-thia-1-azabicyclo[3.2.0]heptane-2-carboxylicacid | CC1(C)S[C@@H]2[C@H](NC(=O)[C@H](N)c3ccccc3)C(=O)N2[C@H]1C(O)=O | ampicillin
5-[3-(2-cyanopropan-2-yl)-6-(1,2,4-triazol-1-ylmethyl)cyclohexa-2,4-dien-1-yl]-2,2-dimethylbutanenitrile | 2-[3-(2-cyanopropan-2-yl)-5-(1,2,4-triazol-1-ylmethyl)phenyl]-2-methylpropanenitrile | CC(C)(C#N)c1cc(Cn2cncn2)cc(c1)C(C)(C)C#N | anastrozole
(4S,6R,7S,10S,11S,14S,15S,16S,20S,23R,26S)-16,17,23,26-tetrahydroxy-7-(4-hydroxycyclohexa-1,3,5-trien-1-yl)-11-(4-pentoxycyclohexa-2,4,6-trien-1-ylidene)-2-[[(2S,3S,4S)-3,4-dihydroxy-4-(4-hydroxycyclohexa-1,3,5-trien-1-yl)-2-[[(3S,4S,6R)-4-hydroxy-1-[(2S,3S)-3-hydroxybutan-2-yl]-2,6-dioxopiperazine-3-carbonyl]amino]butanoyl]amino]-14-methyl-2,5,12,17,24-hexazapentacyclo[24.2.2.218,21.04,10.06,14]dotriaconta-1(29),18(30),19,21(31),27,32-hexaene-3,11,13-trione | N-[(3S,6S,9S,11R,15S,18S,20R,21R,24S,25S,26S)-6-[(1S,2S)-1,2-dihydroxy-2-(4-hydroxyphenyl)ethyl]-11,20,21,25-tetrahydroxy-3,15-bis[(1S)-1-hydroxyethyl]-26-methyl-2,5,8,14,17,23-hexaoxo-1,4,7,13,16,22-hexazatricyclo[22.3.0.09,13]heptacosan-18-yl]-4-[4-(4-pentoxyphenyl)phenyl]benzamide | CCCCCOc1ccc(cc1)c2ccc(cc2)c3ccc(cc3)C(=O)N[C@H]4C[C@@H](O)[C@@H](O)NC(=O)[C@@H]5[C@@H](O)[C@@H](C)CN5C(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@@H]6C[C@@H](O)CN6C(=O)[C@@H](NC4=O)[C@H](C)O)[C@H](O)[C@@H](O)c7ccc(O)cc7)[C@H](C)O | anidulafungin
14-[amino(oxo)methyl]-12-(4-methoxycyclohepta-2,4,6-trien-1-ylidene)-5-(2-oxopiperidin-1-yl)-4,11,12-triazatricyclo[7.3.2.14,8]pentadeca-1(13),6,8(15),10-tetraen-15-one | 1-(4-methoxyphenyl)-7-oxo-6-[4-(2-oxopiperidin-1-yl)phenyl]-4,5-dihydropyrazolo[3,4-c]pyridine-3-carboxamide | COc1ccc(cc1)n2nc(C(N)=O)c3CCN(C(=O)c23)c4ccc(cc4)N5CCCCC5=O | apixaban
(5R,6S)-5-(4-fluorocyclohepta-1,3,6-trien-1-yl)-6-[(1R)-1-[5,5,5-trifluoro-4-(trifluoromethyl)penta-1,3-dienyl]ethoxy]-1,2,5,6-tetrahydro-1,4,7-oxadiazocin-3-one | 5-[[(2S,3R)-2-[(1R)-1-[3,5-bis(trifluoromethyl)phenyl]ethoxy]-3-(4-fluorophenyl)morpholin-4-yl]methyl]-1,2-dihydro-1,2,4-triazol-3-one | C[C@@H](O[C@@H]1OCCN(CC2=NC(=O)NN2)[C@@H]1c3ccc(F)cc3)c4cc(cc(c4)C(F)(F)F)C(F)(F)F | aprepitant
arsorosooxy(oxo)arsane | oxoarsanyloxyarsenic | O=[As]O[As]=O | arsenic trioxide
(1R,4S,5R,8S,9R,10S,12S,13S)-10-methoxy-5,9-dimethyl-11,14,15,16-tetraoxatetracyclo[10.3.1.04,13.08,13]hexadecane | (1R,4S,5R,8S,9R,10S,12R,13R)-10-methoxy-1,5,9-trimethyl-11,14,15,16-tetraoxatetracyclo[10.3.1.04,13.08,13]hexadecane | CO[C@H]1O[C@@H]2O[C@@]3(C)CC[C@H]4[C@H](C)CC[C@@H]([C@H]1C)[C@@]24OO3 | artemether
4-oxo-4-[(1S,4R,5S,8S,9R,10S,15S)-4,9,12-trimethyl-11,16,17,18-tetraoxatetracyclo[10.3.2.05,15.08,15]heptadecan-10-yl]butanoicacid | 4-oxo-4-[[(4S,5R,8S,9R,10R,12R,13R)-1,5,9-trimethyl-11,14,15,16-tetraoxatetracyclo[10.3.1.04,13.08,13]hexadecan-10-yl]oxy]butanoicacid | C[C@@H]1CC[C@H]2[C@@H](C)[C@H](O[C@@H]3OC4(C)CC[C@@H]1[C@@]23OO4)OC(=O)CCC(O)=O | artesunate
5-(1,2-dihydroxyethyl)-4-methylidenefuran-2,3-diol | 2-(1,2-dihydroxyethyl)-4,5-dihydroxyfuran-3-one | OCC(O)C1OC(=C(O)C1=O)O | ascorbic acid
methylN-[(2S)-1-[2-[(2S,3S)-2-hydroxy-3-[[(2S)-2-(methoxycarbonylamino)-3,3-dimethylbutanoyl]amino]-4-phenylbutyl]-2-[(4-pyridin-2-ylcyclohexa-2,5-dien-1-yl)methyl]hydrazinyl]-3,3-dimethyl-1-oxobutan-2-yl]carbamate | methylN-[(2S)-1-[2-[(2S,3S)-2-hydroxy-3-[[(2S)-2-(methoxycarbonylamino)-3,3-dimethylbutanoyl]amino]-4-phenylbutyl]-2-[(4-pyridin-2-ylphenyl)methyl]hydrazinyl]-3,3-dimethyl-1-oxobutan-2-yl]carbamate | COC(=O)N[C@H](C(=O)N[C@@H](Cc1ccccc1)[C@@H](O)CN(Cc2ccc(cc2)c3ccccn3)NC(=O)[C@@H](NC(=O)OC)C(C)(C)C)C(C)(C)C | atazanavir
(3R,5R)-7-[2-(4-fluorocyclohepta-2,4,6-trien-1-ylidene)-3-phenyl-4-(phenylcarbamoyl)-5-propan-2-yl-3H-pyrrol-1-yl]-3,5-dihydroxyheptanoicacid | (3R,5R)-7-[2-(4-fluorophenyl)-3-phenyl-4-(phenylcarbamoyl)-5-propan-2-ylpyrrol-1-yl]-3,5-dihydroxyheptanoicacid | CC(C)c1n(CC[C@@H](O)C[C@@H](O)CC(O)=O)c(c2ccc(F)cc2)c(c3ccccc3)c1C(=O)Nc4ccccc4 | atorvastatin
5-[3-[1-[(4,5-dimethoxycyclohexa-1,3,5-trien-1-yl)methyl]-7,8-dimethoxy-2-methyl-1,3,4,6-tetrahydroisoquinolin-2-ium-2-yl]propanoyloxy]pentyl13-[4-[2-[4,5,6-trimethoxy-10-(4,5-dimethoxycyclohexa-2,4-dien-1-ylidene)cyclobut-2-en-1-yl]ethyl]-4-methyl-7-oxo-1-oxa-4-azoniacyclononan-1-yl]propanoate | 5-[3-[1-[(3,4-dimethoxyphenyl)methyl]-6,7-dimethoxy-2-methyl-3,4-dihydro-1H-isoquinolin-2-ium-2-yl]propanoyloxy]pentyl3-[1-[(3,4-dimethoxyphenyl)methyl]-6,7-dimethoxy-2-methyl-3,4-dihydro-1H-isoquinolin-2-ium-2-yl]propanoate | COc1ccc(CC2c3cc(OC)c(OC)cc3CC[N+]2(C)CCC(=O)OCCCCCOC(=O)CC[N+]4(C)CCc5cc(OC)c(OC)cc5C4Cc6ccc(OC)c(OC)c6)cc1OC | atracurium
(9-methyl-4-oxa-9-azabicyclo[4.2.1]nonan-5-yl)3-hydroxy-2-phenylpropanoate | (8-methyl-8-azabicyclo[3.2.1]octan-3-yl)3-hydroxy-2-phenylpropanoate | CN1C2CCC1CC(C2)OC(=O)C(CO)c3ccccc3 | atropine
[(2S,5R)-2-(carbamoyl)-7-oxo-1,6-diazabicyclo[3.2.1]octan-6-yl]hydrogensulfate | [(2S,5R)-2-carbamoyl-7-oxo-1,6-diazabicyclo[3.2.1]octan-6-yl]hydrogensulfate | NC(=O)[C@@H]1CC[C@@H]2CN1C(=O)N2OS(O)(=O)=O | avibactam
1-methyl-4-nitro-5-(7H-purin-6-ylsulfanyl)-4H-pyrimidine | 6-(3-methyl-5-nitroimidazol-4-yl)sulfanyl-7H-purine | Cn1cnc(c1Sc2ncnc3nc[nH]c23)[N+]([O-])=O | azathioprine
(2R,3S,5R,6S,7R,9S)-7-[(2R,4R)-5-[[(2R,3R,4R,5R)-4,5-dihydroxy-3-methoxy-5-methyloxan-2-yl]-methylamino]-2-hydroxy-4-methylpentan-2-yl]-9-[(2R,4S,5S,6S)-4-(dimethylamino)-5-hydroxypentan-2-yl]oxy-3-ethyl-6-hydroxy-2,6-dimethyl-4-[(2R,4R,5S,6S)-5-hydroxy-4-methoxy-4-methyloxan-2-yl]oxyoxonan-1-one | (2R,3S,4R,5R,8R,10R,11R,13S,14R)-11-[(2S,3R,4S,6R)-4-(dimethylamino)-3-hydroxy-6-methyloxan-2-yl]oxy-2-ethyl-3,4,10-trihydroxy-13-[(2R,4R,5S,6S)-5-hydroxy-4-methoxy-4,6-dimethyloxan-2-yl]oxy-3,5,6,8,10,12,14-heptamethyl-1-oxa-6-azacyclopentadecan-15-one | CC[C@H]1OC(=O)[C@H](C)[C@@H](O[C@H]2C[C@@](C)(OC)[C@@H](O)[C@H](C)O2)C(C)[C@@H](O[C@@H]3O[C@H](C)C[C@@H]([C@H]3O)N(C)C)[C@](C)(O)C[C@@H](C)CN(C)[C@H](C)[C@@H](O)[C@]1(C)O | azithromycin
barium(2+);sulfate | barium(2+);sulfate | [Ba++].[O-][S]([O-])(=O)=O | barium sulfate
(1S,10S,11S,13S,14S,15S,17S)-18-chloro-14,17-dihydroxy-14-(2-hydroxyacetyl)-13,15,18-trimethyltetracyclo[8.7.1.01,6.011,15]octadeca-2,5-dien-4-one | (8S,9R,10S,11S,13S,14S,16S,17R)-9-chloro-11,17-dihydroxy-17-(2-hydroxyacetyl)-10,13,16-trimethyl-6,7,8,11,12,14,15,16-octahydrocyclopenta[a]phenanthren-3-one | C[C@H]1C[C@H]2[C@@H]3CCC4=CC(=O)C=C[C@]4(C)[C@@]3(Cl)[C@@H](O)C[C@]2(C)[C@@]1(O)C(=O)CO | beclometasone
(1R,2S)-1-(6-bromo-2-methoxyquinolin-3-yl)-4-(dimethylamino)-2-naphthalen-1-yl-1-phenylbutan-2-ol | (1R,2S)-1-(6-bromo-2-methoxyquinolin-3-yl)-4-(dimethylamino)-2-naphthalen-1-yl-1-phenylbutan-2-ol | COC1=NC2=C(C=C(Br)C=C2)C=C1[C@@H](C1=CC=CC=C1)[C@@](O)(CCN(C)C)C1=CC=CC2=C1C=CC=C2 | bedaquiline
11-[bis(2-chloroethyl)amino]-4-methyl-2,4-diazabicyclo[7.3.1]trideca-1(12),2,9-triene-3-carboxylicacid | 4-[5-[bis(2-chloroethyl)amino]-1-methylbenzimidazol-2-yl]butanoicacid | Cn1cCC(O)=O)nc2cc(ccc12)N(CCClCll | bendamustine

</body>

</html>

From the table above, I noticed that few prediction were different and I decided to check online to validate these compound and I realized Pubchem does that.I listed out the following drug with whose has different/slight prediction between Ersilia and STOUT . 
Provided below is the link with the iupac_name of those drug from from Pubchem:
[Abracavir](https://pubchem.ncbi.nlm.nih.gov/compound/441300#section=Names-and-Identifiers)
[Abiraterone](https://pubchem.ncbi.nlm.nih.gov/compound/132971#section=Names-and-Identifiers)
[Acetazolamide](https://pubchem.ncbi.nlm.nih.gov/compound/1986#section=Names-and-Identifiers)
[Aclidinium](https://pubchem.ncbi.nlm.nih.gov/compound/11519741#section=Names-and-Identifiers)
[Afatinib](https://pubchem.ncbi.nlm.nih.gov/#query=afatinib)
[Amikacin](https://pubchem.ncbi.nlm.nih.gov/#query=amikacin)
[Amlodipine](https://pubchem.ncbi.nlm.nih.gov/#query=amlodipine)
[Anidulafungin](https://pubchem.ncbi.nlm.nih.gov/#query=anidulafungin)
[Apixaban](https://pubchem.ncbi.nlm.nih.gov/#query=apixaban)
[Ascorbic Acid](https://pubchem.ncbi.nlm.nih.gov/#query=ascorbic%20acid)
[Bendamustine](https://pubchem.ncbi.nlm.nih.gov/#query=bendamustine)

### **Observation** 
I noticed that STOUT prediction is 100% accurate when in comparison with Pubchem iupac name WHILE  Ersilia is 80% accurate with Pubchem IUPAC name

### **Task 5 - Install and run Docker!**  **Day 20 (22nd October 2023)**
This is my first-hand experience dealing with docker. So I took my time to read and understand [docker documentation](https://docs.docker.com/get-started/) the installation process of docker, the functionality, and command-line interface.
These are the following steps and code I use to install docker on Ubuntu
1. Updated the local package using `sudo apt update`
2. Installed required dependencies
`sudo apt install -y apt-transport-https ca-certificates curl software-properties-common`
After running this, I got an error.
The error is: `E: Sub-process /usr/bin/dpkg returned an error code (1)`. 
I checked online and saw that the error could be a result of broken dependencies. Then, I ran the following command to fix that and it worked

sudo dpkg --configure -a sudo apt --fix-broken install

This fixed the error I was getting
3. Added Docker's repository and Docker's official GPG key to verify the package system

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

4. Next, I Install Docker Engine using this command `sudo apt install -y docker-ce docker-ce-cli containerd.io`
5. Then, I started and enabled the docker

sudo systemctl start docker sudo systemctl enable docker

6. To verify if I successfully installed docker, I ran decided to check the version of docker installed using the command `docker --version`, and this [docker_output](https://github.com/ersilia-os/ersilia/files/13196593/docker_output.txt) shows it has been successfully installed.
7.  I ran `docker ps` to test docker and my output was:

(base) ajoke@DESKTOP-KTJU3QV:~$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORT NAMES

This further explains that I've no container currently.
9. To ensure its functionality, I proceeded to test the docker by running `docker run hello-world` to pull and run a container. This is the resulting output below:

(base) ajoke@DESKTOP-KTJU3QV:~$ docker run hello-world

Hello from Docker! This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:

The Docker client contacted the Docker daemon.
The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64)
The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading.
The Docker daemon streamed that output to the Docker client, which sent it to your terminal.

To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID: https://hub.docker.com/

For more examples and ideas, visit: https://docs.docker.com/get-started/

Since this is my first-hand experience using Docker, I decided to explore by pulling a stout model from Ersilia's using Docker. I did the following:
- I pulled out model `eos4se9` using this command `docker pull ersiliaos/eos4se9` and I got the following output

(base) ajoke@DESKTOP-KTJU3QV:~$ docker pull ersiliaos/eos4se9 Using default tag: latest latest: Pulling from ersiliaos/eos4se9 8b91b88d5577: Pull complete 824416e23423: Pull complete bbe2c2981082: Pull complete 7b6b68d15a5c: Pull complete 71f8f4db541d: Pull complete 4f4fb700ef54: Pull complete 278266b40c52: Pull complete 4298588a86ad: Pull complete dddca77c0f59: Pull complete a113a2030c72: Pull complete 0c8571d61669: Pull complete Digest: sha256:3c0b4dab7a313bfb33c74b45ca378f7d69b0b9dbaaf843357780180910af31ab Status: Downloaded newer image for ersiliaos/eos4se9:latest docker.io/ersiliaos/eos4se9:latest

- Then I proceeded to run the model `eos4ee9` using this command `docker run ersiliaos/eos4se9`
- I ran docker ps and got this output

(base) ajoke@DESKTOP-KTJU3QV:~$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORT NAMES d6e3b2b2b5fd ersiliaos/eos4se9 "sh /root/docker-ent…" 50 seconds ago Up 15 seconds 83/tcp pedantic_antonelli

HellenNamulinda commented 9 months ago

Hello @Ajoke23, From your updated comment here, it appears you were able to fetch the model and also completed all the week 1 tasks. Well done :clap:!

Please proceed to week 2 tasks and update each in a comment for faster follow-up. Incase you need any help, kindly let us know.

Ajoke23 commented 9 months ago

Hello @Ajoke23, From your updated comment here, it appears you were able to fetch the model and also completed all the week 1 tasks. Well done :clap:!

Please proceed to week 2 tasks and update each in a comment for faster follow-up. Incase you need any help, kindly let us know.

Yes, I was able to fetch the model successfully. Thanks a lot @HellenNamulinda

HellenNamulinda commented 8 months ago

Hello @Ajoke23, You are yet to complete week 2 tasks. Is there any way we can support you?

Ajoke23 commented 8 months ago

Hello @Ajoke23, You are yet to complete week 2 tasks. Is there any way we can support you?

Hi @HellenNamulinda I had lot of challenges doing the week 2 task based on my model of interest, STOUT. But I have been able to figure it out and I will update it soonest. Thank you

DhanshreeA commented 8 months ago

Hi @Ajoke23 thank you for the updates. Let us know how it goes! :)

Ajoke23 commented 8 months ago

WEEK 3 - MODEL SUGGESTIONS

TASK 1: FIRST MODEL SUGGESTION

Model Title: A robust deep learning workflow to predict CD8 + T-cell epitopes

Date of Publication: 13th September, 2023 Publication: Genome Medicine License: Creative Commons Dataset: Dataset Used Source Code: https://github.com/ChloeHJ/TRAP Code: Python and R Slug: TRAP

DESCRIPTION OF THIS MODEL

TRAP model utilizes the use of deep learning for prediction of immunogenicity and decision tree classifier for estimating the degree of correctness. It utilize the following features such as: amino acids at contact position, hydrophobicity, large and aromatic side chains, peptide-MHC binding affinity which correlates to the recognition of T-cell and robust prediction of CD8+ T-cell epitopes from MHC-I ligands.

RELEVANCE OF THIS MODEL TO ERSILIA

Predicting CD8+ T-cell epitopes is of utmost importance when developing tools and vaccine for diseases that are dominant in low and middle-income countries such as Cancer, neglected tropical diseases which align with Ersilia's mission
Understanding CD8+ T-cell epitopes is useful in diagnosing viral infections and diseases.
The current experimental procedures for identify CD8+ T-cell epitopes is labor intensive and expensive. TRAP model which is a computational prediction model provides alternative ways to screen, predicting & characterize T-cell epitopes and most importantly, it's cost effective.
Utilizing of the model helps to solves cancerous cell problem by destroying them and developing immunotherapies and adoptive cell therapy which is a cancer treatment.

CODE IMPLEMETATION

This code has a well detailed installation process. The following steps are needed in installing TRAP models:

Ran this command git clone https://github.com/ChloeHJ/TRAP.git on Ubuntu and I got this output below which explicitly indicate the successfully cloning of the repository

(base) ajoke@DESKTOP-KTJU3QV:~$ git clone https://github.com/ChloeHJ/TRAP.git
Cloning into 'TRAP'...
remote: Enumerating objects: 60, done.
remote: Counting objects: 100% (60/60), done.
remote: Compressing objects: 100% (53/53), done.
remote: Total 60 (delta 22), reused 10 (delta 4), pack-reused 0
Receiving objects: 100% (60/60), 7.94 MiB | 643.00 KiB/s, done.
Resolving deltas: 100% (22/22), done

Create and activate conda environment

conda create -n trap python=3.9
conda activate trap

Installing of the required packages pip install -r requirements.txt

TASK 2: SECOND MODEL SUGGESTION

Model Title: Enhancing drug property prediction with dual-channel transfer learning based on molecular fragment

Publication: BMC Bioinformatics Year of Publication: 2023 Authors: Yue Wu, Xinran Ni, Zhihao Wang & Weike Feng Slug: FREL Source Code: https://github.com/Ruowu9944/FREL Dataset: GraphMVP,MoleculeNet License: None Code: Python

DESCRIPTION OF THIS MODEL

The model incorporates neural network specifically FRagment-based dual-channEL pretraining (FREL) which uses generative learning and contrastive learning techniques to achieve intra- and inter-molecular agreement. The molecular fragments provides a deeper understanding of underlying molecular mechanisms which will help researchers in customizing drug design that are tailored to specific diseases and patient populations. Research shows that learned molecular representations better capture the drug property variation, fragment semantics which provides insightful relationship between molecules fragment and drug discovery

RELEVANCE OF THIS MODEL TO ERSILIA

Accurate predictions of molecular property is useful in Drug repositioning i.e. identifying new uses in existing drug which are highly effective against infectious diseases. Thus, saving cost, time and resources.
Infectious diseases that has been neglected due to resources constraints can now be fully implemented using this model for the development of various treatment of infectious diseases, viral infection e.t.c. This align with Ersilia's mission of making research accessible to all.

The model will empower researchers to partake in drug discovery since the model is time and cost effective

CODE IMPLEMETATION

The following version of dependencies must be met

numpy             1.21.2
scikit-learn      1.0.2
pandas            1.3.4
python            3.7.11
torch             1.10.2+cu113
torch-geometric   2.0.3
transformers      4.17.0
rdkit             2020.09.1.0
ase               3.22.1
descriptastorus   2.3.0.5
ogb               1.3.3

Installation of dataset from here

cd datasets
python molecule_preparation.py

Pre train the classification model by using the command:
```
cd src
python pretrain_cls.py --dropout_ratio=0
```
Pre train the regression model python pretrain_reg.py --dropout_ratio=0

-Fine-tune classification model, run the following code: python finetune_cls.py --dropout_ratio=0.5 --dataset=bace

Pre-train regression model, run the following code: python finetune_reg.py --dropout_ratio=0.5 --dataset=esol

TASK 3: THIRD MODEL SUGGESTION

Model Title: Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization

Date of Publication: 29th May, 2023 Publication: Journal of Cheminformatics Author: Umit V. Ucak, Islambek Ashyrmamatov & Juyoung Lee Dataset: data Source Code: https://github.com/snu-lcbc/atom-in-SMILES Slug: AIS Code: Python License: CC BY-SA 4.0

DESCRIPTION atoms-in-Smiles uses the principle of tokenization schemes which is a preprocessing step in NLP (Natural Language Processing). The fall short of accuracy of traditional SMILES not been able to reflect true nature of molecules gave rise to atoms-In-Smiles (AIS). These tokenization provides provides chemically accurate, tailor-made tokens for molecular property prediction, chemical translation, and molecular generative models. This solves problem of: Single-step retrosynthesis, Molecular Property Prediction, Normalized repetition rate, Fingerprint nature of AIS, Single-token repetition (rep-l), input-output equivalent mapping

RELEVANCE TO ERSILIA

Accuracy of molecular property prediction depends on the quality of chemical language models. Molecular structure are useful for researches when developing new drug for treatment of infectious diseases. The relevance of chemical model for drug discovery of diseases makes it relevant to Ersilia.

CODE IMPLEMETATION

The code is well documented and I was able set it up doing the following with the use of Google Collab

pip install git+https://github.com/snu-lcbc/atom-in-SMILES and this repository.txt shows I've successfully cloned the repository

I installed the necessaries dependencies

installing the needed dependencies

!pwd
!pip3 install selfies
!pip3 install --upgrade deepsmiles 
!pip3 install SmilesPE > SmilesPE.txt
!pip3 install seaborn==0.12.2 > seaborn.txt
!pip3 install rdkit
!pip3 install atomInSmiles > ais.txt

The output are: SmilesPE, deepsmiles, seaborn, atomsInSmiles

IMPLEMENTATION

This model has various implementation such as: Single-step retrosynthesis, Molecular Property Prediction, Normalized repetition rate, Fingerprint nature of AIS, Single-token repetition (rep-l), input-output equivalent mapping. I will be working on implementing Normalized repetition rate which describes: Natural products, drugs, metal complexes, lipids, steroids', isomers. To achieve this, I use python code and the code is as shown below:

#importing the necessary libaries
import codecs
import tarfile
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from rdkit import Chem
import atomInSmiles
import selfies as sf
import deepsmiles
from SmilesPE.tokenizer import SPE_Tokenizer
from SmilesPE.tokenizer import atomwise_tokenizer
sns.set_theme()

def smiles_tokenizer(smi):
    #Tokenize a SMILES molecule or reaction
    import re
    pattern =  "(\[[^\]]+]|Br?|Cl?|N|O|S|P|F|I|b|c|n|o|s|p|\(|\)|\.|=|#|-|\+|\\\\|\/|:|~|@|\?|>|\*|\$|\%[0-9]{2}|[0-9])"
    regex = re.compile(pattern)
    tokens = [token for token in regex.findall(smi)]
    assert smi == ''.join(tokens), f"{smi=}\t {''.join(tokens)=}"
    return  str ' '.join(tokens)
def get_rep(sent, l = 300):
    cnt = 0
    for i, w in enumerate(sent):
        if w in sent[max(i - l, 0):i]:
            cnt += 1
    return cnt

#repld = get_rep(test)
#repld

with tarfile.open('/datum.tar.gz') as tarf: 
    tarf.extractall('data')
    print(f"Extracting files...")

data_files = {
    'data/steroids_final.data': 'Stereoids',
    'data/metals_final.data': 'Metal complexes',
    'data/fda_final.data': 'FDA approved drugs',
    'data/lipids_final.data': 'Lipids',
    'data/naturals_final.data': 'Natural products',
    'data/isomer.data': 'Isomers of octane',
}

def create_catplot(data_files):
    for csv_file, subtitle in data_files.items():
        # Load data
        print(csv_file)
        df = pd.read_csv(csv_file, sep='\t', header=None)
        df.columns = ['Token types', 'Repetition', 'Normalized repetition', 'Length', 'Unique tokens']

        # Create catplot
        catplot = sns.catplot(
            data=df, x="Token types", y="Normalized repetition", hue="Unique tokens",
            native_scale=True, zorder=1
        )
        catplot.set(ylim=(-0.03, 1.0))
        catplot.set_xticklabels(rotation=90)
        catplot.set_xlabels("Token types", fontsize=14)
        catplot.set_ylabels("Normalized repetition", fontsize=14)
        # Set font size for hue legend
        #catplot.ax.legend(title="Unique tokens", fontsize=14)
        catplot.ax.tick_params(axis='x', labelsize=14)
        catplot.ax.tick_params(axis='y', labelsize=14)

        # Map the Token types to integers
        mapping = {'DeepSMILES': 0, 'SMILES': 1, 'SELFIES': 2, 'AIS': 3, 'SmilesPE': 4}
        df['Token types'] = df['Token types'].map(mapping)
        # Compute mean and standard deviation for each Token type
        mean_vals = df.groupby(['Token types'])['Normalized repetition'].mean()
        std_vals = df.groupby(['Token types'])['Normalized repetition'].std()

   # Plot the mean values and error bars
        for i, (mean_val, std_val) in enumerate(zip(mean_vals, std_vals)):
            x_pos = i  # the x position of the horizontal line
            y_pos = mean_val  # the y position of the horizontal line
            #color = sns.color_palette()[i]  # the color of the horizontal line
            plt.plot([x_pos + 0.2, x_pos + 0.4], [y_pos, y_pos], color='black', linestyle='-', linewidth=1)
            plt.plot([x_pos + 0.3, x_pos + 0.3], [y_pos - std_val, y_pos + std_val], linestyle=':',color='black', linewidth=1)
        # Add title and adjust margins
        catplot.fig.suptitle(subtitle, fontsize=14, fontweight='bold')
        plt.subplots_adjust(top=0.93, bottom=0.3)
        plt.gcf().set_size_inches(6, 6)
        # Save the plot
        plt.savefig(csv_file[:-5] + 'Ho.png')
        # Close the plot to free up memory
        # plt.close()
 #Example usage
create_catplot(data_files)

The ouput:

fda_finalHo naturals_finalHo metals_finalHo steroids_finalHo

The distributions show the unique characteristics of tokenization schemes on representative datasets, designed to test different facets of molecular structures such as coordination compounds, ligands (metal complexes), ring structures and functional groups (steroids), long-chain formations (phospholipids, ionizable lipids), complex and diverse structures (natural products)

SUMMARY

To avoid duplication of model, I checked the list of pending model yet to be incorporated and I searched through my 3 models I suggested and none was found in the list. Hence, all the 3 models suggested are new

Ajoke23 commented 8 months ago

Week 4 - Submit the final application in the Outreachy website

I started by working on my project timeline which the template was sent to @DhanshreeA for review.
I made sure I've all recorded all the links to my contribution.
I have successfully completed recorded my final contribution to Outreachy website. The final contribution contains Project Timeline, all the links to the contribution made i.e task completed and bug found

GemmaTuron commented 8 months ago

Hello,

Thanks for your work during the Outreachy contribution period, we hope you enjoyed it! We will now close this issue while we work on the selection of interns. Thanks again!

ersilia-os / ersilia

✍️ Contribution period: Ajoke Yusuf #842

Week 1 - Get to know the community

Week 2 - Install and run an ML model

Week 3 - Propose new models

Week 4 - Prepare your final application

WEEK 2

Task 1 - SELECT A MODEL FROM THE SUGGESTED LIST

Task 2 - INSTALL THE MODEL TO YOUR SYSTEM

SMILES to IUPAC name translation

IUPAC name to SMILES translation

importing necessary libraries

reading the file into dataframe

selecting the first 40 rows

function to translate smile to IUPAC

function to translate CAN-SMILES to IUPAC

Creating a new column for the iupac name by using the .apply function

Use .loc to assign values

Creating smiles_iupac dataframe

Creating can_smiles_iupac dataframe

Saving these dataframes to separate CSV files to show the output

importing necessary libraries

Loading the two datasets

smiles_iupac is the STOUT prediction while smiles_output is Ersilia prediction

Merging the datasets on the 'smiles' and 'input' columns

Renaming the columns as specified

Keeping only the desired columns

Saving the dataset to a new CSV file

WEEK 3 - MODEL SUGGESTIONS

TASK 1: FIRST MODEL SUGGESTION

Model Title: A robust deep learning workflow to predict CD8 + T-cell epitopes

DESCRIPTION OF THIS MODEL

RELEVANCE OF THIS MODEL TO ERSILIA

CODE IMPLEMETATION

TASK 2: SECOND MODEL SUGGESTION

Model Title: Enhancing drug property prediction with dual-channel transfer learning based on molecular fragment

DESCRIPTION OF THIS MODEL

RELEVANCE OF THIS MODEL TO ERSILIA

CODE IMPLEMETATION

TASK 3: THIRD MODEL SUGGESTION

Model Title: Improving the quality of chemical language model outcomes with atom-in-SMILES tokenization

RELEVANCE TO ERSILIA

CODE IMPLEMETATION

installing the needed dependencies

IMPLEMENTATION

The ouput:

SUMMARY

Week 4 - Submit the final application in the Outreachy website