Closed joiboi08 closed 1 year ago
Hello everyone. Excited to start contributing. I am now starting with installation of the Ersilia Model Hub! I will also work on a motivation letter alongside it and update both of their progress shortly. I am working on a Windows machine and following the instructions mentioned here!
Task 1 - I joined Esilia's Slack channel from their Outreachy landing page and was welcomed by a community of warm peers and team leads! It was reassuring and exciting to be part of something like this.
Task 2 - Opened this issue with success! :)
Task 3 - Since I am using a Windows platform, I installed WSL and Ubuntu terminal environment as mentioned here. Faced a small issue in my Ubuntu terminal not recognising WSL so I had to manually enable it from Windows Features. It worked fine afterwards. Continued through the mentioned steps.
I installed all prerequisites -
23.9.0
3.10.12
3.7
so if it works I will update it here as per instructed on Slack by @DhanshreeA After the prerequisites, I installed the Ersilia tool!
Here I was faced with an issue because even after I was done with my installation and I had activated the conda environment - I could not run ersilia --help
or ersilia --catalog
I determined this to be because my WSL version was not updated so I updated it to 2 and made sure Ubuntu was using WSL2. I also configured Docker Desktop to use WSL2 after my update.
This fixed my problem and I was able to finally install Ersilia!!
The installation guide as well as my peers like @leilayesufu who shared a detailed documentation of their journey were of immense help wherever I got stuck and I am grateful to them.
Finally, onto testing!
First few testing steps were calling a catalog function and running a simple model.
Alas, I am facing issues in calling ersilia catalog
where it mentions a Errno:101 Network Unreachable error. I have attached a log file below.
myfile.log
ersilia --help
works fine.
So I moved on to ersilia -v fetch eos3b5e
which is also not working with a different error (log file below)
fetchLog.log
I have found a solution to this issue. Certain service providers in India block githubraw and that was causing the Errno101. People may try switching to a different service provider or use a VPN but a more feasible solution is changing your DNS to -
Cloudfare DNS
1.1.1.1 1.0.0.1
for ipv4
and
2606:4700:4700::1111 2606:4700:4700::1001
for ipv6.
Changing to Google DNS is also working.
This has worked well for me and I have successfully tested and used a simple model deployment of Ersilia, getting the desired result that was specified in the instructions.
As was instructed on Slack, I have used Python version 3.10.12
and did not install isaura (thanks to @carcablop) , hence I did not face this issue.
This was a great learning experience for me. I faced errors where I thought everything would go smoothly whereas places where I expected troubles were free flowing. A big bag of my gratitude to the supportive peers that took the time to upload their experiences comprehensively and the community leaders for taking their time to go through everyone's issues to give prompt solutions here and on Slack.
I'm excited for what is to come next! I will add shortly my motivations for applying to Outreachy and what I aim to achieve.
My name is Joyesh Banerjee. I am an engineering graduate from India and did it in electronics and communication engineering. I started out with C and Java but shifted to Python after a while because I was garnering interest in data analysis and later data science in general. I did a few college projects in which we trained a prediction model based on open source datasets and I found the work lively and rewarding. I come from a lower middle class family and have always lived by the skin of our teeth but all thanks to God, we have come far. Every parent works through their bones to give their children a better platform to grab opportunities than they had and I believe this is such a platform that they envisioned for us.
I came across Outreachy as a recommendation from a friend who applied previously and I was warmed by how much soul they had as an organization. They were incredibly inclusive and gave me a chance to really tell my story in my application which I appreciated, so did many of my peers I am sure of it. So I was very happy I was given a chance as a contributor here, because when I was going through the projects - I felt a similar blow of warmth from Ersilia. This was corroborated by the cordial and supportive team I found myself in when I joined their channels. They were prompt and provided quick resolutions for issues. The best part was - even when many of us were facing issues in some initial tasks, in the time the solution had not been found the team was interactive and didn't keep us in the dark. I already have some experience with training models so I firmly believe my time in the internship will be the perfect incubator for my skills to grow rapidly and showcase themselves in the best of ways - helping people in need. And for that I am primed and ready to learn and apply myself to the fullest. After the internship period has ended, I intend to keep applying my competence here by working with the community to plan meaningful contributions. I also want to challenge myself by learning new tech stacks like cloud (AWS) to make deployment of these models more efficient for users and further supplement Ersilia's growth. I want to eventually learn new skills to find a way to enhance current processes.
I have also seen my fair share of medical issues and what hurts most is medical incompetence. I lost my grandmother to undiscovered side effects during her treatment which blindsided the family and the doctors on her case. When I went through Ersilia's objectives and intentions with this technology, I couldn't help but think what could have happened if someone had thought of this a decade earlier.
As the saying goes - "The best time to plant a tree was 10 years ago. The second best time is now."
This community is working on a novel goal that will help countless people and I wish to become part of that effort to my utmost. It will swell my heart with joy if I can help further this idea to reality - so that a decade from now maybe one less child will wish someone had thought of something today.
After providing a detailed motivational letter, I have submitted a contribution report through Outreachy and linked this issue as instructed! The contribution has been recorded successfully.
Hello @joiboi08 thank you for the detailed updates. If you'd like, you can get started with the tasks from week 2 now. :)
Hello @joiboi08 thank you for the detailed updates. If you'd like, you can get started with the tasks from week 2 now. :)
Thank you! I'll update my progress in week 2 shortly.
After completing week - 1 tasks, attending a wonderful and informative session and some personal time - I have begun work on week 2!
Tasks of this week pit us the closest to real-internship work so far. Understood summary of tasks -
install one of the suggested models and run them locally, reference the author's repo and source code as needed. Mention why this mode was chosen.
once installed and successfully tested, use the Essential Medicines List CSV as in input for the chosen model. Use third party implementation of the model to extract output.
compare this output to the Ersilia Model Hub implementation of the same model. Note any differences or points of interest. Since I have no experience with Docker, I will test myself and use a Docker container to implement the Ersilia Model.
This is my interpretation of the Week - 2 tasks. If I misinterpreted something I am happy to be corrected.
For this task, I was divided between PPBopt model and the STOUT model. My understanding of the former is that it is a prediction engine that predicts how well a compound binds with blood for transportation to target sites, finding important use cases in optimizing drug development costs and time. This model was interesting and is also the closest to my experience as I have worked on prediction models before.
However, I wanted to try something different and the STOUT model peaked my interest. My understanding of it is converts a compound's SMILE name (which is essentially the ASCII-symbol representation of its structure so it is machine readable) to its IUPAC defined name and vice-versa. This model was also much better documented and I felt I saw a more concrete roadmap here.
Hence, I opted to work with the STOUT model.
To start with, I am following these steps for the STOUT model installation.
First, I open my local Linux environment that I setup in the Week - 1 tasks. I run some -- version
checks to ensure I am using appropriate versions -
Python 3.11.4
Conda 23.9.0
and WSL 2
All good! Onto the installation!
I first create a conda environment to activate the model in :
conda create --name STOUT
Activating STOUT
conda activate STOUT
Installing dependencies
conda install -c decimer stout-pypi
Two tries were made to fetch repo data but failed.
Log file for the failure - fail.log
It could not find the pystow
package.
I tried doing pip install pystow
but it did not work - it gave the same error again.
So I tried using alternate methods mentioned in the repo -
pip install STOUT-pypi
It successfully installed all required packages! (worth ~624MB!)
Since the model was now installed, I was ready to test!
While testing, I am running into some issues.
I needed a dataset for testing and I found a demo dataset in the STOUT repo and tried to use it. However, it keeps giving me a ImportError - log attached below. logs.log
EDIT - This is a post-Task 2-completion edit. Since I got the model working and the test file ran fine, I wanted to test it with a more comprehensive test file like the demo file I mentioned here. However, when I ran it - just like with VsCode, the Ubuntu terminal also went into a no-return suspend state and my CPU usage hit a constant 100% again. I'll try and wait an hour like this and update progress here.
I was primarily running this on Ubuntu but I have since switched to VSCode (and its CLI) for better maneuverability. I got the same ImportError referenced above. I found a solution by changing the relative import in the stout.py file to an absolute import :
from .repack import helper
to from repack import helper
This allowed me to move forward. After I ran the code again, it finally downloaded the model and gave me success message (in the VSCode CLI) that model was loaded. But when it compiling the code, it tagged the IUPAC_names_test.txt file with a FileNotFoundError
So I switched the working directory to cd STOUT
which solved it.
However, after this I ran the code hoping no issues should persist now. But it is in a suspended state and has not given an output in the CLI. I checked Task Manager for any clues and it showed a constant 100% CPU usage the whole time I was viewing Task Manager.
I am going to try again.
conda install -c decimer stout-pypi
but it kept giving missing package errors so I skipped that used pip install STOUT-pypi
miniconda3\envs\STOUT
Requirement already satisfied
a lot because I kept retrying and rerunning code so many packages were already donetest.py
python3 test.py
OSError: [Errno 0] JVM DLL not found: Define/path/or/set/JAVA_HOME/variable/properly
sudo apt install default-jre
JAVA_PATH
using export JAVA_HOME = /usr/bin
as this is where the prev step has put the root This period is shaping up to be a concrete learning experience for me. It gets a little mental sometimes but it is always rewarding when I manage to pull through!
Now that we have setup our model and tested it once, we use it in a pseudo-real time scenario.
I create a file eml_result.py
in VSCode.
First, we go through the EML dataset provided here to determine what kind of conversion is being made.
I can see that three categories are given
drugs
or common name smiles
can_smiles
The dataset itself was very large and it would have taken my machine a large amount of time to process the entire set. So I decided I would sample 20 data points as my input.
For that I needed the csv
python module.
Made my necessary imports:
import csv
from STOUT import translate_forward
list
for easy working. I did that using this code :
# intention is to convert the EML csv into a list version of EML
with open("eml_canonical.csv", newline='') as eml_csv :
reader = csv.reader(eml_csv) # returns each row of EML as a list
eml_list = list(reader) # list of each eml row as a list
eml_list
which was a list of each row of the EML csv file. can_smiles_list
:can_smiles_list = [] # empty list that will hold canonical smiles
for name in eml_list[1:21] : # includes first 20 SMILE rows excluding the header
can_smiles_list.append(name[2]) # we have a list of canonical smiles to be translated
iupac_
that contains the translated names of all 20 canonical smiles: iupac_ = [] # empty list that will hold translated iupac names
for name in can_smiles_list :
result = translate_forward(name)
iupac_.append(result)
π΄ This where I am facing an issue. I am running this in VSCode and it does NOT recognise from STOUT
as a module. I have made sure my working directory is in the conda environment and run my code from there. But it is not recognising it. Any help is appreciated!
After a discussion with my peer @PromiseFru, I was able to conclude that the problem was I had to separately activate the STOUT library in the VSC CLI again. Since my working dir was in the conda env made during the installation of the model, I thought this wouldn't be an issue. To fix it I can do 2 things -
I chose to do the latter as it saved time but I will install conda on VSC for future work.
for loop
to print the translated list in a readable mannerfor i in iupac_ :
print(i)
python3 eml_result.py
print( )
statement and instead make a separate .csv
file for the translated IUPAC names as predicted_iupac.csv
:# writes a list of translated iupac names to the file 'predicted_iupac.csv'
with open("predicted_iupac.csv", "w") as trans_iupac :
writer = csv.writer(trans_iupac)
for i in iupac_ :
writer.writerow(i)
Working data and Result data CSV files :
# Since the EML file has canonical SMILE names
# we import only translate_forward to translate from SMILES to IUPAC
import csv
from STOUT import translate_forward
#! CONVERTING EML CSV TO EML LIST OF LISTS
# intention is to convert the EML csv into a list version of EML
with open("eml_canonical.csv", newline='') as eml_csv :
reader = csv.reader(eml_csv) # returns each row of EML as a list
eml_list = list(reader) # list of each eml row as a list
#! EXTRACTING to-be-translated CANONICAL FORMS FROM SOURCE EML LIST
can_smiles_list = [] # empty list that will hold canonical smiles
for name in eml_list[1:21] : # includes first 20 SMILE rows excluding the header
can_smiles_list.append(name[2]) # we have a list of canonical smiles to be translated
iupac_ = [] # empty list that will hold translated iupac names
for name in can_smiles_list :
result = translate_forward(name)
iupac_.append(result)
# writes a list of translated iupac names to the file 'predicted_iupac.csv'
with open("predicted_iupac.csv", "w") as trans_iupac :
writer = csv.writer(trans_iupac)
for i in iupac_ :
writer.writerow(i)
I want to try completing this using docker!
First I will try and run the model locally using the instructions provided here.
Changed my working dir to miniconda3/envs/ersilia
Activated Ersilia conda env ------ this step is easy to miss!
conda activate ersilia
checked if Ersilia was working by running ersilia --help
and ersilia catalog
----- both ran well π―
Now I was ready to fetch the model. The SMILES to IUPAC model is under name eos4se9
with slug smiles2iupac
ersilia fetch eos4se9
Initially, it did not run. So I tried docker
which return as notFound command. So I launched DockerDesktop and
docker
---- success! ersilia fetch eos4se9
---- success! Served the model
ersilia serve eos4se9
Now, the model is ready to use!
I will feed the model a test dataset (.csv) of two SMILES : task3.csv
The way to run the model as mentioned here is to :
ersilia api run -i <<input_file.csv>> -o <<desired_output_file_name.csv>>
ersilia -v api run -i task3.csv -o result3.csv
TypeError: object of type 'NoneType' has no len()
that was faced in the initial Week - 1 tasks. The best working solutions were to
read_input_columns
which I have not triedPlease advise @DhanshreeA @carcablop @HellenNamulinda
Have you tried giving it a single input as opposed to the entire EML file to test it?
Hi @leilayesufu I haven't processed the entire file yet. I was feeding it a modified dataset task3.csv of 2 inputs as a test before giving it the entire EML set.
Okay, try testing it with a single input directly as though. ersilia -v api run -i "Nc1nc(NC2CC2)c2ncn([C@H]3C=CC@@HC3)c2n1" not through the file
I've run into a worse problem. I am unable to fetch or serve models. I keep getting the connection reset by peer
error without fail. I have reinstalled the environment multiple times without this changing.
ConnectionResetLog.log
I'm going to try to do it, and i'll get back to you
Thank you so much. I look forward to hearing your experience. I am using Ubuntu 22.04
HI, so i fetched the model and served it as seen
then i ran ersilia card eos4se9
, the output showed as seen here
"Code": "$ ersilia serve smiles2iupac\n$ ersilia api -i 'CCCOCCC'\n$ ersilia close",
So to run predictions, i just did ersilia api -i "CCCC"
and i got the output below
here
This was just a simple texting although @Promisefru ran it with some inputs from the EML file and it gave him a null output
Hi @leilayesufu
Thank you for trying this out for yourself.
I reinstalled my environment and tried your steps to-the-letter, but I kept getting one of three errors when I tried to use
ersilia api -i "CCCC
Errno104 Connection reset by peer
wood.log
Errno111 Max retries/Connection refused
this happened only once
HI, I'm thinking it could be your network then
Hi @leilayesufu I have a strong connection and I have also made sure I am not running into this error again as I can view githubraw files. I previously was able to fetch and run models but I have only recently been unable to do so.
Hi, i would suggest removing the entire environment and starting afresh, or you could wait for a mentors opinion. @DhanshreeA
Hi @joiboi08 as discussed over Slack, let me look into this more. I will get back to you by tomorrow.
Thank you @DhanshreeA and @leilayesufu. I am looking forward to the updates. Meanwhile is it ok if I move on to Week - 3 tasks for now?
Hi, since the problem is a geographical one. I'll suggest using a vpn and changing your location to complete your week 2 tasks. Ofcourse, you'll need the go ahead from @DhanshreeA
Hi @leilayesufu, thank you for your suggestion. I ran a VPN and did a fresh install of ersilia and the conda environment as well as the git packages. It is now successfully able to fetch and serve models so I am a little relieved. I believe my peer @Ajoke23 also mentioned this on Slack, thank you as well. @DhanshreeA VPN is working as an interim solution for the regional service outages.
Currently, I am again facing this problem - TypeError: object of type 'NoneType' has no len()
I am trying some solutions and will update here.
π’ π’ SOLUTION - On the advice of my peer @AlphonseBrandon I added headers to my input file and it worked. Thank you so much.
Now, I feed the this file into my fetched model eos4se9
using the command
$ ersilia -v api run -i 'eml.csv' -o 'result.csv'
The first 9 rows do not have a translation. I ran it again and this time I did NOT have any rows translated. During both the processes, two things happened consistently -
I always get these DEBUG logs :
11:24:14 | DEBUG | Starting Docker Daemon service
11:24:14 | DEBUG | Creating temporary folder /tmp/ersilia-nb24uaof and mounting as volume in container
11:24:14 | DEBUG | Image ersiliaos/eos4se9:latest is available locally
11:24:14 | DEBUG | Using port 46089
I didn't really have experience with Docker, but from what I could tell, the inputs were given to a remote docker container running the eos4se9 model
which in turn returned predictions. And googling the error 504
informed me it is a timeout error. So basically, I was not getting translated outputs because the container kept getting timed out.
I searched around and found two solutions -
I chose option 1 as it will allow me to gain more experience with Docker implementation.
So - as I fetch and serve the models on Ubuntu, I see a corresponding container being created on Docker Desktop. From here, I can get my container ID
to call it in Ubuntu.
From Docker Desktop, my current running container has id - eos4se9_7a24
Since processing the entire EML dataset will take an impractical amount of time (mainly due to hardware limitations), I have taken the first 20 rows of the dataset as input for both the STOUT 3rd party model and the Ersilia Hub Model.
I already have a modified dataset er_task3.csv so the step to take here is to copy this file into the working dir of my container.
I was able to complete this step using the docker cp
command -
$ docker cp er_task3.csv eos4se9_7a24:/root
Thank you @leilayesufu @PromiseFru for helping me figure out the container dir!
To access this container through Ubuntu, I use the command -
$ docker exec -it eos4se9_7a24 sh
# ls
It is! Great!
# ersilia -v api run -i er_task3.csv -o er_result.csv
$ docker cp eos4se9_7a24:/root \\wsl.localhost\Ubuntu\home\joyesh\miniconda3\envs\ersilia
Here, the container files are copied over to the mentioned destination and we can easily find our result file here.
Successfully predicted all SMILES names to IUPAC!
After getting both results, I wanted to combine both result files into a single csv or excel file. For that, I wrote some python code to :
I used the csv module again
import csv
with open("er_result.csv", newline='') as ers :
ers_base_list = list(csv.reader(ers)) # list of list of each ersilia translation row
ers_list = []
for name in ers_base_list[1:] :
ers_list.append(name[2]) # extracting only rows under iupacs_names column
with open("111predicted_iupac111.csv", newline='') as stout :
stout_base_list = list(csv.reader(stout)) # list of list of stout translation rows
stout_list = []
for name in stout_base_list : # no headers in this file
stout_list.append(name[0]) # list of stout translations
[<STOUT iupac name>, <Ersilia iupac name>]
result_list = []
for i in range(0,20) : # because 20 SMILES names were translated
result_list.append(stout_list[i]) # adding STOUT IUPAC
result_list.append(ers_list[i]) # adding Ersilia IUPAC
with open("comparison_result.csv", "w") as comp :
writer1 = csv.writer(comp)
for i in result_list :
writer1.writerow([i])
translate forward
function in the code we write from scratch.python
is needed to generate an output file. This has greater weight as deployment time and efficiency is higher here. CCCC
or CC(=O)O
(3S,8R,9S,10R,13S,14S)-10,13-dimethyl-17-pyridin-3-yl-2,3,4,7,8,9,11,12,14,15-decahydro-1H-cyclopenta[a]phenanthren-3-ol
(1S,2S,5S,10R,11R,14S)-5,11-dimethyl-5-pyridin-3-yltetracyclo[9.4.0.02,6.010,14]pentadeca-7,16-dien-14-ol
This week was the greatest challenge yet as I made myself familiar with new technology and got stuck A LOT!! However, it was joyous to see myself progress. Excited for the next tasks!
Marks the start of some real field work!
PIGNet2 - A Versatile Deep Learning-based Protein-Ligand Interaction Prediction Model for Binding Affinity Scoring and Virtual Screening
Publication - Papers With Code
Source Code - Github
Authors : Seokhyun Moon, Sang-Yeon Hwang, Jaechang Lim, Woo Youn Kim
Date Published - July 3, 2023
Their objective is to predict PLI (Protein-Ligand Interaction) in the form of screening (identifying compounds that possibly have binding affinity OR do not have it) and scoring (predicting the binding affinity of the protein-ligand complex in a way that is comparable to experimental values) as well as improving the binding process.
Why? It is not the first model to try and predict PLI. However, the important differentiator is that this model is achieving high accuracy results in two different tasks simultaneously using the same dataset. Most other models are trained in a task-specific way, i.e they perform well in one task but cannot do well in a different task. This is due to the lack of experimental structure-affinity data that limits the generalization ability of existing models. This makes PigNet2 a novel ML model that is dexterous and gives high accuracy results for different tasks in a relatively efficient manner vis a viz having separate models for each of those tasks which still provide lower accuracy.
Their solution to create a generalized model despite the lack of available structure-affinity experimental data was use an inductive bias and augment existing data to create similar/near-native structures that were energetically and geometrically similar to the crystal structures. The model was then trained to predict the binding affinity of these structures to be the same as the experimental value. This made PigNet2 show significantly enhanced scoring and screening performance.
Hi @leilayesufu, thank you for your suggestion. I ran a VPN and did a fresh install of ersilia and the conda environment as well as the git packages. It is now successfully able to fetch and serve models so I am a little relieved. I believe my peer @Ajoke23 also mentioned this on Slack, thank you as well. @DhanshreeA VPN is working as an interim solution for the regional service outages.
Currently, I am again facing this problem -
TypeError: object of type 'NoneType' has no len()
I am trying some solutions and will update here.π’ π’ SOLUTION - On the advice of my peer @AlphonseBrandon I added headers to my input file and it worked. Thank you so much.
* Input file [eml.csv](https://github.com/ersilia-os/ersilia/files/12895576/eml.csv) This file has the first 20 rows of canonical SMILES names (excluding the header row) to match the 20 rows of input used in 3rd party STOUT model implementation.
Now, I feed the this file into my fetched model
eos4se9
using the command$ ersilia -v api run -i 'eml.csv' -o 'result.csv'
* Now, I get the result output file [result.csv](https://github.com/ersilia-os/ersilia/files/12895771/result.csv)
BUT
The first 9 rows do not have a translation. I ran it again and this time I did NOT have any rows translated. During both the processes, two things happened consistently -
1. Batch prediction failed and it swtiched to individual prediction 2. I got a 504 error from every single row that failed to translate * I always get these DEBUG logs :
11:24:14 | DEBUG | Starting Docker Daemon service 11:24:14 | DEBUG | Creating temporary folder /tmp/ersilia-nb24uaof and mounting as volume in container 11:24:14 | DEBUG | Image ersiliaos/eos4se9:latest is available locally 11:24:14 | DEBUG | Using port 46089
* I didn't really have experience with Docker, but from what I could tell, the inputs were given to a remote **docker container** running the `eos4se9 model` which in turn returned predictions. And googling the `error 504` informed me it is a timeout error. So basically, I was not getting translated outputs because the container kept getting timed out. * I searched around and found two solutions - 1. Instead of communicating with a remote container, I run the predictions directly from within the container. 2. Increase the nginx request timeout. * I chose **option 1** as it will allow me to gain more experience with Docker implementation. * So - as I fetch and serve the models on Ubuntu, I see a corresponding container being created on Docker Desktop. From here, I can get my `container ID` to call it in Ubuntu. From Docker Desktop, my current running container has id - `eos4se9_7a24` * Since processing the entire EML dataset will take an impractical amount of time (mainly due to hardware limitations), I have taken the first 20 rows of the dataset as input for both the STOUT 3rd party model and the Ersilia Hub Model. * I already have a modified dataset [er_task3.csv](https://github.com/ersilia-os/ersilia/files/12907831/er_task3.csv) so the step to take here is to copy this file into the working dir of my container. * I was able to complete this step using the docker `cp` command -
$ docker cp er_task3.csv eos4se9_7a24:/root
Thank you @leilayesufu @PromiseFru for helping me figure out the container dir!
* Now the modified dataset er_task3.csv is copied into the working dir of the container.
To access this container through Ubuntu, I use the command -
$ docker exec -it eos4se9_7a24 sh
* Check to see if the dataset is present using `# ls` ![image](https://user-images.githubusercontent.com/94055810/275242982-3aad8167-456f-4d01-b69e-a434d08b3e22.png)
It is! Great!
* We input the dataset here and run the model and output the result into a file
# ersilia -v api run -i er_task3.csv -o er_result.csv
* Now, the generated result is still in the container and to access it, we need to copy it to our local system
$ docker cp eos4se9_7a24:/root \\wsl.localhost\Ubuntu\home\joyesh\miniconda3\envs\ersilia
Here, the container files are copied over to the mentioned destination and we can easily find our result file here.
Successfully predicted all SMILES names to IUPAC!
Comparison between STOUT implementation and Ersilia implementation
After getting both results, I wanted to combine both result files into a single csv or excel file. For that, I wrote some python code to :
* turn both .csv files into respective lists
I used the csv module again
import csv with open("er_result.csv", newline='') as ers : ers_base_list = list(csv.reader(ers)) # list of list of each ersilia translation row ers_list = [] for name in ers_base_list[1:] : ers_list.append(name[2]) # extracting only rows under iupacs_names column with open("111predicted_iupac111.csv", newline='') as stout : stout_base_list = list(csv.reader(stout)) # list of list of stout translation rows stout_list = [] for name in stout_base_list : # no headers in this file stout_list.append(name[0]) # list of stout translations
* combine those lists into **one** list of format : `[<STOUT iupac name>, <Ersilia iupac name>]`
result_list = [] for i in range(0,20) : # because 20 SMILES names were translated result_list.append(stout_list[i]) # adding STOUT IUPAC result_list.append(ers_list[i]) # adding Ersilia IUPAC
* turn that list into a single .csv file
with open("comparison_result.csv", "w") as comp : writer1 = csv.writer(comp) for i in result_list : writer1.writerow([i])
* The resultant comparison file [comparison_result.csv](https://github.com/ersilia-os/ersilia/files/12907649/comparison_result.csv)
My Interpretations
* The STOUT third party model was for me more modular in terms of determining an output format. The output file generated [as seen here](https://github.com/ersilia-os/ersilia/files/12896825/111predicted_iupac111.csv) does NOT have excess columns, only the translated IUPAC names. This made it easier to work with as it required less cleaning/prepping. * The Ersilia Model however gives a verbose multi-column list without an opportunity to change that output since the output file is generated directly by the model. Whereas in the STOUT model, we imported the STOUT module and used the `translate forward` function in the code we write from scratch. * On the flipside, this makes the Ersilia Model implementation more time efficient as no `python` is needed to generate an output file. This has greater weight as deployment time and efficiency is higher here. * As for the result contents itself, there is mostly no difference save for a few situations. The models perform similarly/identically for smaller, less complex inputs like `CCCC` or `CC(=O)O` * But minor differences start to show for larger, more complex inputs as seen here :
(3S,8R,9S,10R,13S,14S)-10,13-dimethyl-17-pyridin-3-yl-2,3,4,7,8,9,11,12,14,15-decahydro-1H-cyclopenta[a]phenanthren-3-ol (1S,2S,5S,10R,11R,14S)-5,11-dimethyl-5-pyridin-3-yltetracyclo[9.4.0.02,6.010,14]pentadeca-7,16-dien-14-ol
This week was the greatest challenge yet as I made myself familiar with new technology and got stuck A LOT!! However, it was joyous to see myself progress. Excited for the next tasks!
Marking Week - 2 complete!
Hi @joiboi08 many congratulations on making it this far. Good job on learning more about working with docker, thank you @leilayesufu and @PromiseFru for all the help here.
As for the network connection we faced earlier, the issue seemed to have resolved on its own after a couple of days, and I can work with Ersilia normally again without VPN. (As guessed, it was probably a geographical outage)
I'm having fun learning new things! @DhanshreeA And thank you for the update! It is a relief that it is not something permanent βΊοΈ
To run this model, I followed the instructions mentioned in their repository.
pip3 install torch torchvision torchaudio
pip3 install torch torchvision torchaudio
This command was generated by the pytroch website according to the options you need.
gh repo clone ACE-KAIST/PIGNet2
conda create -n pignet2 python=3.9
conda activate pignet2
pip install -r requirements.txt
cd PIGNet2/dataset
bash download.sh
bash untar.sh
I have been facing some hardware problems with its implementation in that it eats up all available space in my C drive and sometimes uses up all the RAM causing other applications to fail. I am looking into an alternative implementation that can possibly be better than running it locally.
ChemProp - A Message Passing Neural Network for Molecular Property Prediction and its Application in A Deep Learning Approach to Antibiotic Discovery
Publication - CELL
Source Code - Github
Authors : Jonathan M. Stokes, Kevin Yang, Kyle Swanson, Wengong Jin, Andres Cubillos-Ruiz, Nina M. Donghia, Craig R. MacNair, Shawn French, Lindsey A. Carfrae, Zohar Bloom-Ackermann, Victoria M. Tran, Anush Chiappino-Pepe, Ahmed H. Badran, Ian W. Andrews, Emma J. Chory, George M. Church, Eric D. Brown, Tommi S. Jaakkola, Regina Barzilay, James J. Collins
Date Published - February 20, 2020
Their objective is to use a molecular property prediction model and use it to screen for possible new antibiotic compounds by predicting the likelihood that a molecule would inhibit the growth of E. coli.
Why? This is an important objective because globally antibiotic effectiveness is falling and this is a major health concern. According to WHO, a growing number of infections β such as pneumonia, tuberculosis, gonorrhoea, and salmonellosis β are becoming harder to treat as the antibiotics used to treat them become less effective. While antibiotic resistance occurs naturally, the misuse of antibiotics in humans and animals is accelerating the process. This misuse refers to people often stop taking the drug when they start to feel better as opposed to completing the prescribed cycle. This wipes out just enough of the bacteria to heal the person but leaves just enough that the next generation will be better used to fighting the antibiotic.
AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens
Publication - PubMed
Source Code - Github
Authors : Chenkai Li, Darcy Sutherland, S Austin Hammond, Chen Yang, Figali Taho, Lauren Bergman, Simon Houston, RenΓ© L Warren, Titus Wong, Linda M N Hoang, Caroline E Cameron, Caren C Helbing, Inanc Birol
Date Published - 25 January, 2022
Their objective is to use a deep learning model (AMPlify) to predict effective peptides against a panel of WHO priority pathogens.
Why? The concern it is working against is the same - ie the growing resistance to antibiotics and its globally degrading effectiveness. However, here the proposed solution is different. Unlike the second model, where the goal was to find similar compounds to antibiotics, this model sets out to find ALTERNATIVES to anitbiotics in novel antimicrobial peptides (AMPs) which are general purpose action drugs against bacteria, viruses, fungi and parasites. They use the predicted peptides against a list of WHO priority pathogens and 4 of the novel AMPs proved effective against multiple species of bacteria, including a multi-drug resistant isolate of e. coli.
Just updating that I have submitted the final application through Outreachy along with a timeline on 27 October, 2023!
Hello,
Thanks for your work during the Outreachy contribution period, we hope you enjoyed it! We will now close this issue while we work on the selection of interns. Thanks again!
Week 1 - Get to know the community
Week 2 - Install and run an ML model
Week 3 - Propose new models
Week 4 - Prepare your final application