Kcfreshly commented 2 years ago

Applicant: <@Kcfreshly>

Welcome to the Ersilia Open Source Initiative. This issue will serve to track all your contributions for the project “Improve the documentation and outreach material of the Ersilia Model Hub”.

Please tick the tasks as you complete them. To make a final application it is not required to have completed all tasks. Only the Initial Steps and Community sections are REQUIRED. The tasks are not ordered from more to less important, they are simply related to different skills. Start where you feel most comfortable. This project can be adapted to the applicants interests, please focus on the type of tasks that you prefer / have better skills / would like to work on as an intern.

Initial steps:

[x] Record your application for the project in the outreachy website referencing this issue. Please make sure to select the right project on the website.
[x] Join the Slack channel to follow public communications
[x] Comment under this issue explaining why are you interested in this project

GitHub documentation:
[x] Create a README file with the name under the /documentation folder
[x] Link the #PR in a comment under this issue
[x] Incorporate feedback from the mentor

Writing dissemination material
[x] Read the Strategic plan 2021-2023 for Ersilia and create a 1-page blogpost with the main points
[x] Comment under this issue with a link to the blogpost (a google docs for example)
[x] Incorporate feedback from the mentor
[x] Choose your own topic related to Ersilia (AI/ML for biomedical research, neglected diseases, drug discovery…) and write a 1-page blogpost to communicate to a non-expert audience
[x] Comment under this issue with a link to the blogpost (a google docs for example)
[x] Incorporate feedback from the mentor
[x] Create a template for a twitter post to release every time a new model is incorporated in the Hub (twitter: 280 characters, you can suggest a main post + thread with extra information) and add it as a comment under this issue
[x] Create a template short Newsletter (1 paragraph) to send every month to our community (funders, users, contributors). It should mention metrics (models in the hub, number of users, funding…), thank you etc

Technical skills (required for the tutorial only)
[x] Install the Ersilia Model Hub
[x] Test one model
[x] Add a screenshot under this issue showing the model running in your computer
[x] Write a docstring for the ErsiliaModel class. Use the Google Python Style guide. Paste the docstring as a comment below (do not use a PR).

Graphic material
[x] Read the Ersilia Brand Guidelines
[x] Read “Why Ersilia?”
[x] Create one image / slide to explain Ersilia’s mission and vision
[x] Link to the image/slide as a comment under this issue
[x] Incorporate feedback from the mentor
[x] Create two slides / short video showing how to use the Ersilia Model Hub and add them under the /tutorial folder
[ ] Link the #PR in this issue
[x] Incorporate feedback from the mentor

Scientific content
[x] Check the models available in the Hub
[x] Select one model from the list and write a technical card (what is the model for, what input, which data was used to create it, what kind of ML algorithm uses…) for it
[x] Add your card as a comment to this issue
[x] Search the scientific literature and suggest 3 new models (comment in this issue) that would be relevant to incorporate in the Hub.

Community
[x] Look up two other projects and comment on their issues with feedback on one of their tasks
[x] If you have feedback from your peers, answer it in this issue.

Other

If you have interest in working on related topics, or have new suggestions, please do the following
[ ] Add a comment in this issue with your new idea, tagging the mentor
[ ] Get feedback from the mentor and act accordingly
[ ] Link in the comments any other PR you have contributed to.

Final application
[x] I have answered all comments from mentors and contributors
[x] All PR or issues assigned to me are complete
[x] I have submitted my final application to the project

Kcfreshly commented 2 years ago

After going through the presentations made by @GemmaTuron in TReND Africa, I became so interested in this project. Ersilia is an open source initiative that aims to provide AI models to enable scientist make drug discovery much quicker. Thousands of children and adult die in Africa yearly over illnesses that have few or insufficient drugs.

With what Ersilia is doing, scientist can easily predict models much quicker in developing drugs that will save millions of life. As a Chemical engineer the basic goal is to always create a better and faster process route in every process. I am glad to also be in a project that aims to produce more effective and faster drug discovery process. It will be an honor to be a part of a revolutionary team aiming to save millions of lives worldwide.

Kcfreshly commented 2 years ago

Here is a link to the README.md file i created. Please i am open to constructive criticisms and recommendations. @GemmaTuron @miquelduranfrigola https://github.com/ersilia-os/ersilia/blob/0945fbd6db6c022bd452817e57248d6b3bfb1e93/documentation/README_Kcfreshly.md

GemmaTuron commented 2 years ago

Hi @Kcfreshly

Thanks for your work! It looks really good, I'm just leaving here a few comments as a guidance!

Images: I appreciate that you have gone through our videos and available docs to find images! I feel perhaps two images are too many as people will get distracted from the important information. Can you select one of them? I think the model steps would be more appropriate here.
Features: the section bioactivity signatures description is not focusing in the bioactivity signatures. This refers to the backbone technology of the Ersilia Models which was developed by Miquel Duran and published in Nature Biotechnology. This is quite technical, do you have a background in chemistry/biology that can help you follow it? The technology section could be eliminated and moved to the bioactivity signatures, which refers to the CC
The installation steps are very clear
In the usage section, the code snippets have a number 1, usually what we use is a $ before the actual command, but we do not list steps if they are in code format
Section How to use Ersilia App and below are not properly formatted.
Check the License

Hope this helps, if you can add the feedback and let me know would be fantastic!

Kcfreshly commented 2 years ago

Hello @GemmaTuron I have made the recommended changes and also created a pull request.

133

Kcfreshly commented 2 years ago

Also my major in school was Chemical engineering, so i have a background in chemistry and a bit in biology

GemmaTuron commented 2 years ago

Hi @Kcfreshly thanks for adding the feedback! If it is of interest due to your background, check the Scientific content section!

Kcfreshly commented 2 years ago

Docstring for for the ErsiliaModel class

class ErsiliaModel(ErsiliaBase):

    def __init__(
        self,
        model,
        save_to_lake=True,
        service_class=None,
        config_json=None,
        credentials_json=None,
        verbose=None,
        fetch_if_not_available=True,
    ):

        ''' 
    Constructs all the necessary attributes for the ErsiliaModel object.

        parameters
        ----------
            model : any
                first name of the person
            save_to_lake : boolean
                saves model into isaura
            service_class : none
                path to a service class
            config_json : none
                path to a configuration file
            crendentials_json : none
                path to credentials file
            verbose : none
                 path to a verbose file
            fetch_if_not_available : boolean
                retun if boolean is true
        '''

 def is_valid(self):
        """return a valid model"""

    def _set_api(self, api_name):
        """
    Set the API

    parameter
        ----------
            api_name: str, optional
                name of the API
        """

    def _method(input=None, output=None, batch_size=DEFAULT_BATCH_SIZE):
            """
        Returns the input, output, apiname,  and the size of the model

        paramter
            ---------
                input: str
                    command entered     
                output: str
                    result of input
                batch_size: literal
                    size of the model
            """

def _set_apis(self):
        """This is a setter method that contains the list of APIs"""

 def _get_api_instance(self, api_name):
        """This is a getter method that gets the APIs"""

 def _api_runner_iter(self, api, input, output, batch_size):
        """This raises an error if any of the parameter has a none value"""

 def _api_runner_return(self, api, input, output, batch_size):
        """This method handles exception from  def _api_runner_iter(self, api, input, output, batch_size)"""

def __output_is_file(output):
        """This function returns output only in .csv and .h5"""

 def __output_is_format(output):
        """This function returns output with format json, numpy, pandas, dict"""

  def _get_api_runner(self, output):
        """This function get output that that meet the satisy the function __output_is_file(output) conditions"""

def _evaluate_do_cache_splits(self, input, output):
        """
    This crosscheck if there is a need for a split
        parameter:
        ---------
            input:
               command entered
            output:
               result of the input 
        """

 def _do_cache_splits(self, input, output):
        """
    Check if string split where there is a separator

        parameter:
        -----------
           input:
              command entered
           output:
              result of input
        """

 def api_task(self, api_name, input, output, batch_size):
        """Runs the task of the model api"""

 def serve(self):
        """Serves the model"""

def close(self):
        """Close the model"""

Kcfreshly commented 2 years ago

Ersilia Model

Screenshot of Ersilia model successfully running on Ubuntu

Kcfreshly commented 2 years ago

This is the link to my first Blog Blog1

This is the link to my second Blog Blog2

@GemmaTuron and everyone I will be glad to hear your recommendations.

Kcfreshly commented 2 years ago

Here is my suggested Twitter Template:

Saving lives and eradicating communicable diseases is our purpose. We have made another milestone progress in defeating (Malaria) with our latest released (Chemprop-antibiotic model)

Thread Ersilia believes that AI/ML can shorten the lengthy time for drug discovery, and we are glad that this model will bring us closer to that goal. We encourage you to be a contributor to this mission today! checkout our Github, we believe no contribution is too small.

[ ] Why we decided to incorporate the model and who it will benefit
[ ] What is the model, and who are the contributors?
[ ] Reiterate how the model can influence drug discovery regarding such disease
[ ] Reassure our followers that we have more fantastic and life-changing models in the pipeline

GemmaTuron commented 2 years ago

Hi @Kcfreshly

Great job you have done a lot of work. From the blogposts, I would only suggest to read a bit about the drug discovery pipeline and its stages (i.e, the lead optimization phase is focused on optimizing molecules, not genes) I like the template for twitter, I would perhaps add a more direct style, encouraging people to go to out hub to check the tools

I suggest you wrap up any tasks you are working on, add the comments above and you are ready to submit a final application, thanks for the job!

Kcfreshly commented 2 years ago

@GemmaTuron Thanks for the feedback, this is the email newsletter template I created

https://drive.google.com/file/d/1H_zJKlaCSdl4Olo7HMknp6LZYX53aTA1/view?usp=sharing

Kcfreshly commented 2 years ago

Technical Card

Model ID: eos1amr

Model: Blood Brain Barrier

Description: This model predicts the Blood Brain Barrier Penetration (BBBP). Most therapeutic molecules and antibodies that can help disease treatment can not cross the BBB in adequate amounts to be clinically effective. BBBP uses GROVER, a Graph Neural Network pretrained to predict Blood Brain Barrier Penetration (BBBP).

Input Compounds

Output BBBP

Algorithm: The algorithm employed for the BBBP is the GROVER, an advanced technology compared to the previous Graph Neural Networks (GNNs), which had insufficient data, hence delivering low accuracy in targeting of BBBP.

Dataset:
BBBP dataset is curated by MoleculeNet. This dataset curates permeability information for 2000 molecules from the scientific literature.

Github Repository: https://github.com/ersilia-os/eos1amr Publication: https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html

Kcfreshly commented 2 years ago

I found these models interesting and will be glad if it can be incorporated into the Ersilia model

PDBbind: PDBbind is a comprehensive database of experimentally measured binding affinities for bio-molecular complexes.48,49 Unlike other ligand-based biological activity datasets, in which only the structures of ligands are provided, PDBbind provides detailed 3D Cartesian coordinates of both ligands and their target proteins derived from experimental (e.g., X-ray crystallography) measurements. The availability of coordinates of the protein–ligand complexes permits structure-based featurization that is aware of the protein–ligand binding geometry.

Lipophilicity: Lipophilicity is an important feature of drug molecules that affects both membrane permeability and solubility. This dataset provides experimental results of octanol/water distribution coefficient (log D at pH 7.4) of 4200 compounds.

SIDER: The Side Effect Resource (SIDER) is a database of marketed drugs and adverse drug reactions (ADR).55 The version of the SIDER dataset in DeepChem56 has grouped drug side-effects into 27 system organ classes following MedDRA classifications57 measured for 1427 approved drugs.

Elizabeth-Joseph-Mawutin commented 2 years ago

Here is my suggested Twitter Template:

Saving lives and eradicating communicable diseases is our purpose. We have made another milestone progress in defeating (Malaria) with our latest released (Chemprop-antibiotic model) Ersilia believes that AI/ML can shorten the lengthy time for drug discovery, and we are glad that this model will bring us closer to that goal.

Thread

[ ] Why we decided to incorporate the model and who it will benefit

[ ] What is the model, and who are the contributors?

[ ] Reiterate how the model can influence drug discovery regarding such disease

[ ] Reassure our followers that we have more fantastic and life-changing models in the pipeline

Hello @Kcfreshly Nice job on your Twitter template.

Kcfreshly commented 2 years ago

Thank you @ElizabethMawutin

Kcfreshly commented 2 years ago

Here is the link to the card I created for our mission and vision statement

https://drive.google.com/file/d/1nAXHiAPV5V9Y7S09EkC4PulyHSy7WWxv/view?usp=sharing

loweyvana commented 2 years ago

I found these models interesting and will be glad if it can be incorporated into the Ersilia model

PDBbind: PDBbind is a comprehensive database of experimentally measured binding affinities for bio-molecular complexes.48,49 Unlike other ligand-based biological activity datasets, in which only the structures of ligands are provided, PDBbind provides detailed 3D Cartesian coordinates of both ligands and their target proteins derived from experimental (e.g., X-ray crystallography) measurements. The availability of coordinates of the protein–ligand complexes permits structure-based featurization that is aware of the protein–ligand binding geometry.

Lipophilicity: Lipophilicity is an important feature of drug molecules that affects both membrane permeability and solubility. This dataset provides experimental results of octanol/water distribution coefficient (log D at pH 7.4) of 4200 compounds.

SIDER: The Side Effect Resource (SIDER) is a database of marketed drugs and adverse drug reactions (ADR).55 The version of the SIDER dataset in DeepChem56 has grouped drug side-effects into 27 system organ classes following MedDRA classifications57 measured for 1427 approved drugs.

Hi @Kcfreshly. You have added a lot of work! well done!!! Nice job searching, but I think these are not actually models but datasets. From my little understanding, AI algorithms like Random Forests, for example, make use of these datasets to create/train a model. A good example of a model is DeepTox, based on Deep Learning and uses the Tox21 dataset.

GemmaTuron commented 2 years ago

Hi @Kcfreshly Thanks for the newsletter template, looks good! Some feedback on the technical card: we should try to make them shorter, summarizing description and algorithm, or people will not read everything. For the new models, indeed what you suggest are databases from which we could build models, not the final models themselves.

Focus on the final application now !

Kcfreshly commented 2 years ago

Thank you @loweyvana and our mentor @GemmaTuron for the feedback I will make the right changes

Kcfreshly commented 2 years ago

Three Models that can be useful to ersilia hub

Quantitative structure property relationship (QSPR) model helps in predicting the aqueous solubility of drugs which is validated by cross-validation methods. Aqueous solubility of a drug/drug candidate is essential data in drug discovery.

Laplacian Regularized Least Square algorithm (LRLSMDA) is proposed for identifying Microbe-Drug Associations. Predicting hidden microbe-drug associations can be helpful in understanding the microbe-drug association mechanisms in clinical treatment, drug discovery, combinations and repositioning

DrugPred_RNA A model for Structure-Based Druggability Predictions for RNA Binding Sites

Kcfreshly commented 2 years ago

Thank you @GemmaTuron I have made the changes according to your instructions, I am heading now to the final application. It has been an honor contributing on this novel platform

Kcfreshly commented 2 years ago

@GemmaTuron I made a tutorial video, I hope it serves right.

https://drive.google.com/file/d/1ufwLG7z_YFjHIkMpAtLM-9FJsEBIcGYb/view?usp=sharing

Pmaidoo commented 2 years ago

@Kcfreshly this is amazing. great work done !!!!

Kcfreshly commented 2 years ago

Thank you @Pmaidoo

GemmaTuron commented 2 years ago

Super video @Kcfreshly, very detailed! many thanks!

Kcfreshly commented 2 years ago

Thank you so much @GemmaTuron and @miquelduranfrigola for the guidance.

ersilia-os / ersilia

Outreachy Documentation Project: <Kcfreshly> #105

Initial steps:

[x] Comment under this issue explaining why are you interested in this project

GitHub documentation:

[x] Incorporate feedback from the mentor

Writing dissemination material

[x] Create a template short Newsletter (1 paragraph) to send every month to our community (funders, users, contributors). It should mention metrics (models in the hub, number of users, funding…), thank you etc

Technical skills (required for the tutorial only)

[x] Write a docstring for the ErsiliaModel class. Use the Google Python Style guide. Paste the docstring as a comment below (do not use a PR).

Graphic material

[x] Incorporate feedback from the mentor

Scientific content

[x] Search the scientific literature and suggest 3 new models (comment in this issue) that would be relevant to incorporate in the Hub.

Community

[x] If you have feedback from your peers, answer it in this issue.

Other

[ ] Link in the comments any other PR you have contributed to.

Final application

133

Here is my suggested Twitter Template:

Technical Card

Here is my suggested Twitter Template: