Closed Kcfreshly closed 2 years ago
After going through the presentations made by @GemmaTuron in TReND Africa, I became so interested in this project. Ersilia is an open source initiative that aims to provide AI models to enable scientist make drug discovery much quicker. Thousands of children and adult die in Africa yearly over illnesses that have few or insufficient drugs.
With what Ersilia is doing, scientist can easily predict models much quicker in developing drugs that will save millions of life. As a Chemical engineer the basic goal is to always create a better and faster process route in every process. I am glad to also be in a project that aims to produce more effective and faster drug discovery process. It will be an honor to be a part of a revolutionary team aiming to save millions of lives worldwide.
Here is a link to the README.md file i created. Please i am open to constructive criticisms and recommendations. @GemmaTuron @miquelduranfrigola https://github.com/ersilia-os/ersilia/blob/0945fbd6db6c022bd452817e57248d6b3bfb1e93/documentation/README_Kcfreshly.md
Hi @Kcfreshly
Thanks for your work! It looks really good, I'm just leaving here a few comments as a guidance!
Hope this helps, if you can add the feedback and let me know would be fantastic!
Hello @GemmaTuron I have made the recommended changes and also created a pull request.
Also my major in school was Chemical engineering, so i have a background in chemistry and a bit in biology
Hi @Kcfreshly thanks for adding the feedback! If it is of interest due to your background, check the Scientific content section!
Docstring for for the ErsiliaModel class
class ErsiliaModel(ErsiliaBase):
def __init__(
self,
model,
save_to_lake=True,
service_class=None,
config_json=None,
credentials_json=None,
verbose=None,
fetch_if_not_available=True,
):
'''
Constructs all the necessary attributes for the ErsiliaModel object.
parameters
----------
model : any
first name of the person
save_to_lake : boolean
saves model into isaura
service_class : none
path to a service class
config_json : none
path to a configuration file
crendentials_json : none
path to credentials file
verbose : none
path to a verbose file
fetch_if_not_available : boolean
retun if boolean is true
'''
def is_valid(self):
"""return a valid model"""
def _set_api(self, api_name):
"""
Set the API
parameter
----------
api_name: str, optional
name of the API
"""
def _method(input=None, output=None, batch_size=DEFAULT_BATCH_SIZE):
"""
Returns the input, output, apiname, and the size of the model
paramter
---------
input: str
command entered
output: str
result of input
batch_size: literal
size of the model
"""
def _set_apis(self):
"""This is a setter method that contains the list of APIs"""
def _get_api_instance(self, api_name):
"""This is a getter method that gets the APIs"""
def _api_runner_iter(self, api, input, output, batch_size):
"""This raises an error if any of the parameter has a none value"""
def _api_runner_return(self, api, input, output, batch_size):
"""This method handles exception from def _api_runner_iter(self, api, input, output, batch_size)"""
def __output_is_file(output):
"""This function returns output only in .csv and .h5"""
def __output_is_format(output):
"""This function returns output with format json, numpy, pandas, dict"""
def _get_api_runner(self, output):
"""This function get output that that meet the satisy the function __output_is_file(output) conditions"""
def _evaluate_do_cache_splits(self, input, output):
"""
This crosscheck if there is a need for a split
parameter:
---------
input:
command entered
output:
result of the input
"""
def _do_cache_splits(self, input, output):
"""
Check if string split where there is a separator
parameter:
-----------
input:
command entered
output:
result of input
"""
def api_task(self, api_name, input, output, batch_size):
"""Runs the task of the model api"""
def serve(self):
"""Serves the model"""
def close(self):
"""Close the model"""
Screenshot of Ersilia model successfully running on Ubuntu
Saving lives and eradicating communicable diseases is our purpose. We have made another milestone progress in defeating (Malaria) with our latest released (Chemprop-antibiotic model)
Thread Ersilia believes that AI/ML can shorten the lengthy time for drug discovery, and we are glad that this model will bring us closer to that goal. We encourage you to be a contributor to this mission today! checkout our Github, we believe no contribution is too small.
[ ] Why we decided to incorporate the model and who it will benefit
[ ] What is the model, and who are the contributors?
[ ] Reiterate how the model can influence drug discovery regarding such disease
[ ] Reassure our followers that we have more fantastic and life-changing models in the pipeline
Hi @Kcfreshly
Great job you have done a lot of work. From the blogposts, I would only suggest to read a bit about the drug discovery pipeline and its stages (i.e, the lead optimization phase is focused on optimizing molecules, not genes) I like the template for twitter, I would perhaps add a more direct style, encouraging people to go to out hub to check the tools
I suggest you wrap up any tasks you are working on, add the comments above and you are ready to submit a final application, thanks for the job!
@GemmaTuron Thanks for the feedback, this is the email newsletter template I created
https://drive.google.com/file/d/1H_zJKlaCSdl4Olo7HMknp6LZYX53aTA1/view?usp=sharing
Model ID: eos1amr
Model: Blood Brain Barrier
Description: This model predicts the Blood Brain Barrier Penetration (BBBP). Most therapeutic molecules and antibodies that can help disease treatment can not cross the BBB in adequate amounts to be clinically effective. BBBP uses GROVER, a Graph Neural Network pretrained to predict Blood Brain Barrier Penetration (BBBP).
Input Compounds
Output BBBP
Algorithm: The algorithm employed for the BBBP is the GROVER, an advanced technology compared to the previous Graph Neural Networks (GNNs), which had insufficient data, hence delivering low accuracy in targeting of BBBP.
Dataset:
BBBP dataset is curated by MoleculeNet. This dataset curates permeability information for 2000 molecules from the scientific literature.
Github Repository: https://github.com/ersilia-os/eos1amr Publication: https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html
I found these models interesting and will be glad if it can be incorporated into the Ersilia model
PDBbind: PDBbind is a comprehensive database of experimentally measured binding affinities for bio-molecular complexes.48,49 Unlike other ligand-based biological activity datasets, in which only the structures of ligands are provided, PDBbind provides detailed 3D Cartesian coordinates of both ligands and their target proteins derived from experimental (e.g., X-ray crystallography) measurements. The availability of coordinates of the protein–ligand complexes permits structure-based featurization that is aware of the protein–ligand binding geometry.
Lipophilicity: Lipophilicity is an important feature of drug molecules that affects both membrane permeability and solubility. This dataset provides experimental results of octanol/water distribution coefficient (log D at pH 7.4) of 4200 compounds.
SIDER: The Side Effect Resource (SIDER) is a database of marketed drugs and adverse drug reactions (ADR).55 The version of the SIDER dataset in DeepChem56 has grouped drug side-effects into 27 system organ classes following MedDRA classifications57 measured for 1427 approved drugs.
Here is my suggested Twitter Template:
Saving lives and eradicating communicable diseases is our purpose. We have made another milestone progress in defeating (Malaria) with our latest released (Chemprop-antibiotic model) Ersilia believes that AI/ML can shorten the lengthy time for drug discovery, and we are glad that this model will bring us closer to that goal.
Thread
- [ ] Why we decided to incorporate the model and who it will benefit
- [ ] What is the model, and who are the contributors?
- [ ] Reiterate how the model can influence drug discovery regarding such disease
- [ ] Reassure our followers that we have more fantastic and life-changing models in the pipeline
Hello @Kcfreshly Nice job on your Twitter template.
Thank you @ElizabethMawutin
Here is the link to the card I created for our mission and vision statement
https://drive.google.com/file/d/1nAXHiAPV5V9Y7S09EkC4PulyHSy7WWxv/view?usp=sharing
I found these models interesting and will be glad if it can be incorporated into the Ersilia model
PDBbind: PDBbind is a comprehensive database of experimentally measured binding affinities for bio-molecular complexes.48,49 Unlike other ligand-based biological activity datasets, in which only the structures of ligands are provided, PDBbind provides detailed 3D Cartesian coordinates of both ligands and their target proteins derived from experimental (e.g., X-ray crystallography) measurements. The availability of coordinates of the protein–ligand complexes permits structure-based featurization that is aware of the protein–ligand binding geometry.
Lipophilicity: Lipophilicity is an important feature of drug molecules that affects both membrane permeability and solubility. This dataset provides experimental results of octanol/water distribution coefficient (log D at pH 7.4) of 4200 compounds.
SIDER: The Side Effect Resource (SIDER) is a database of marketed drugs and adverse drug reactions (ADR).55 The version of the SIDER dataset in DeepChem56 has grouped drug side-effects into 27 system organ classes following MedDRA classifications57 measured for 1427 approved drugs.
Hi @Kcfreshly. You have added a lot of work! well done!!! Nice job searching, but I think these are not actually models but datasets. From my little understanding, AI algorithms like Random Forests, for example, make use of these datasets to create/train a model. A good example of a model is DeepTox, based on Deep Learning and uses the Tox21 dataset.
Hi @Kcfreshly Thanks for the newsletter template, looks good! Some feedback on the technical card: we should try to make them shorter, summarizing description and algorithm, or people will not read everything. For the new models, indeed what you suggest are databases from which we could build models, not the final models themselves.
Focus on the final application now !
Thank you @loweyvana and our mentor @GemmaTuron for the feedback I will make the right changes
Three Models that can be useful to ersilia hub
Quantitative structure property relationship (QSPR) model helps in predicting the aqueous solubility of drugs which is validated by cross-validation methods. Aqueous solubility of a drug/drug candidate is essential data in drug discovery.
Laplacian Regularized Least Square algorithm (LRLSMDA) is proposed for identifying Microbe-Drug Associations. Predicting hidden microbe-drug associations can be helpful in understanding the microbe-drug association mechanisms in clinical treatment, drug discovery, combinations and repositioning
DrugPred_RNA A model for Structure-Based Druggability Predictions for RNA Binding Sites
Thank you @GemmaTuron I have made the changes according to your instructions, I am heading now to the final application. It has been an honor contributing on this novel platform
@GemmaTuron I made a tutorial video, I hope it serves right.
https://drive.google.com/file/d/1ufwLG7z_YFjHIkMpAtLM-9FJsEBIcGYb/view?usp=sharing
@Kcfreshly this is amazing. great work done !!!!
Thank you @Pmaidoo
Super video @Kcfreshly, very detailed! many thanks!
Thank you so much @GemmaTuron and @miquelduranfrigola for the guidance.
Applicant: <@Kcfreshly>
Welcome to the Ersilia Open Source Initiative. This issue will serve to track all your contributions for the project “Improve the documentation and outreach material of the Ersilia Model Hub”.
Please tick the tasks as you complete them. To make a final application it is not required to have completed all tasks. Only the Initial Steps and Community sections are REQUIRED. The tasks are not ordered from more to less important, they are simply related to different skills. Start where you feel most comfortable. This project can be adapted to the applicants interests, please focus on the type of tasks that you prefer / have better skills / would like to work on as an intern.
Initial steps:
[x] Comment under this issue explaining why are you interested in this project
GitHub documentation:
[x] Incorporate feedback from the mentor
Writing dissemination material
[x] Create a template short Newsletter (1 paragraph) to send every month to our community (funders, users, contributors). It should mention metrics (models in the hub, number of users, funding…), thank you etc
Technical skills (required for the tutorial only)
[x] Write a docstring for the ErsiliaModel class. Use the Google Python Style guide. Paste the docstring as a comment below (do not use a PR).
Graphic material
[x] Incorporate feedback from the mentor
Scientific content
[x] Search the scientific literature and suggest 3 new models (comment in this issue) that would be relevant to incorporate in the Hub.
Community
[x] If you have feedback from your peers, answer it in this issue.
Other
If you have interest in working on related topics, or have new suggestions, please do the following
[ ] Link in the comments any other PR you have contributed to.
Final application