ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
210 stars 142 forks source link

Outreachy Documentation Project: <Kcfreshly> #105

Closed Kcfreshly closed 2 years ago

Kcfreshly commented 2 years ago

Applicant: <@Kcfreshly>

Welcome to the Ersilia Open Source Initiative. This issue will serve to track all your contributions for the project “Improve the documentation and outreach material of the Ersilia Model Hub”.

Please tick the tasks as you complete them. To make a final application it is not required to have completed all tasks. Only the Initial Steps and Community sections are REQUIRED. The tasks are not ordered from more to less important, they are simply related to different skills. Start where you feel most comfortable. This project can be adapted to the applicants interests, please focus on the type of tasks that you prefer / have better skills / would like to work on as an intern.


Initial steps:

Kcfreshly commented 2 years ago

After going through the presentations made by @GemmaTuron in TReND Africa, I became so interested in this project. Ersilia is an open source initiative that aims to provide AI models to enable scientist make drug discovery much quicker. Thousands of children and adult die in Africa yearly over illnesses that have few or insufficient drugs.

With what Ersilia is doing, scientist can easily predict models much quicker in developing drugs that will save millions of life. As a Chemical engineer the basic goal is to always create a better and faster process route in every process. I am glad to also be in a project that aims to produce more effective and faster drug discovery process. It will be an honor to be a part of a revolutionary team aiming to save millions of lives worldwide.

Kcfreshly commented 2 years ago

Here is a link to the README.md file i created. Please i am open to constructive criticisms and recommendations. @GemmaTuron @miquelduranfrigola https://github.com/ersilia-os/ersilia/blob/0945fbd6db6c022bd452817e57248d6b3bfb1e93/documentation/README_Kcfreshly.md

GemmaTuron commented 2 years ago

Hi @Kcfreshly

Thanks for your work! It looks really good, I'm just leaving here a few comments as a guidance!

Hope this helps, if you can add the feedback and let me know would be fantastic!

Kcfreshly commented 2 years ago

Hello @GemmaTuron I have made the recommended changes and also created a pull request.

133

Kcfreshly commented 2 years ago

Also my major in school was Chemical engineering, so i have a background in chemistry and a bit in biology

GemmaTuron commented 2 years ago

Hi @Kcfreshly thanks for adding the feedback! If it is of interest due to your background, check the Scientific content section!

Kcfreshly commented 2 years ago

Docstring for for the ErsiliaModel class

class ErsiliaModel(ErsiliaBase):

    def __init__(
        self,
        model,
        save_to_lake=True,
        service_class=None,
        config_json=None,
        credentials_json=None,
        verbose=None,
        fetch_if_not_available=True,
    ):

        ''' 
    Constructs all the necessary attributes for the ErsiliaModel object.

        parameters
        ----------
            model : any
                first name of the person
            save_to_lake : boolean
                saves model into isaura
            service_class : none
                path to a service class
            config_json : none
                path to a configuration file
            crendentials_json : none
                path to credentials file
            verbose : none
                 path to a verbose file
            fetch_if_not_available : boolean
                retun if boolean is true
        '''

 def is_valid(self):
        """return a valid model"""

    def _set_api(self, api_name):
        """
    Set the API

    parameter
        ----------
            api_name: str, optional
                name of the API
        """

    def _method(input=None, output=None, batch_size=DEFAULT_BATCH_SIZE):
            """
        Returns the input, output, apiname,  and the size of the model

        paramter
            ---------
                input: str
                    command entered     
                output: str
                    result of input
                batch_size: literal
                    size of the model
            """

def _set_apis(self):
        """This is a setter method that contains the list of APIs"""

 def _get_api_instance(self, api_name):
        """This is a getter method that gets the APIs"""

 def _api_runner_iter(self, api, input, output, batch_size):
        """This raises an error if any of the parameter has a none value"""

 def _api_runner_return(self, api, input, output, batch_size):
        """This method handles exception from  def _api_runner_iter(self, api, input, output, batch_size)"""

def __output_is_file(output):
        """This function returns output only in .csv and .h5"""

 def __output_is_format(output):
        """This function returns output with format json, numpy, pandas, dict"""

  def _get_api_runner(self, output):
        """This function get output that that meet the satisy the function __output_is_file(output) conditions"""

def _evaluate_do_cache_splits(self, input, output):
        """
    This crosscheck if there is a need for a split
        parameter:
        ---------
            input:
               command entered
            output:
               result of the input 
        """

 def _do_cache_splits(self, input, output):
        """
    Check if string split where there is a separator

        parameter:
        -----------
           input:
              command entered
           output:
              result of input
        """

 def api_task(self, api_name, input, output, batch_size):
        """Runs the task of the model api"""

 def serve(self):
        """Serves the model"""

def close(self):
        """Close the model"""
Kcfreshly commented 2 years ago

Ersilia Model

Screenshot of Ersilia model successfully running on Ubuntu

Kcfreshly commented 2 years ago

This is the link to my first Blog Blog1

This is the link to my second Blog Blog2

@GemmaTuron and everyone I will be glad to hear your recommendations.

Kcfreshly commented 2 years ago

Here is my suggested Twitter Template:

Saving lives and eradicating communicable diseases is our purpose. We have made another milestone progress in defeating (Malaria) with our latest released (Chemprop-antibiotic model)

Thread Ersilia believes that AI/ML can shorten the lengthy time for drug discovery, and we are glad that this model will bring us closer to that goal. We encourage you to be a contributor to this mission today! checkout our Github, we believe no contribution is too small.

GemmaTuron commented 2 years ago

Hi @Kcfreshly

Great job you have done a lot of work. From the blogposts, I would only suggest to read a bit about the drug discovery pipeline and its stages (i.e, the lead optimization phase is focused on optimizing molecules, not genes) I like the template for twitter, I would perhaps add a more direct style, encouraging people to go to out hub to check the tools

I suggest you wrap up any tasks you are working on, add the comments above and you are ready to submit a final application, thanks for the job!

Kcfreshly commented 2 years ago

@GemmaTuron Thanks for the feedback, this is the email newsletter template I created

https://drive.google.com/file/d/1H_zJKlaCSdl4Olo7HMknp6LZYX53aTA1/view?usp=sharing

Kcfreshly commented 2 years ago

Technical Card

Model ID: eos1amr

Model: Blood Brain Barrier

Description: This model predicts the Blood Brain Barrier Penetration (BBBP). Most therapeutic molecules and antibodies that can help disease treatment can not cross the BBB in adequate amounts to be clinically effective. BBBP uses GROVER, a Graph Neural Network pretrained to predict Blood Brain Barrier Penetration (BBBP).

Input Compounds

Output BBBP

Algorithm: The algorithm employed for the BBBP is the GROVER, an advanced technology compared to the previous Graph Neural Networks (GNNs), which had insufficient data, hence delivering low accuracy in targeting of BBBP.

Dataset:
BBBP dataset is curated by MoleculeNet. This dataset curates permeability information for 2000 molecules from the scientific literature.

Github Repository: https://github.com/ersilia-os/eos1amr Publication: https://papers.nips.cc/paper/2020/hash/94aef38441efa3380a3bed3faf1f9d5d-Abstract.html

Kcfreshly commented 2 years ago

I found these models interesting and will be glad if it can be incorporated into the Ersilia model

PDBbind: PDBbind is a comprehensive database of experimentally measured binding affinities for bio-molecular complexes.48,49 Unlike other ligand-based biological activity datasets, in which only the structures of ligands are provided, PDBbind provides detailed 3D Cartesian coordinates of both ligands and their target proteins derived from experimental (e.g., X-ray crystallography) measurements. The availability of coordinates of the protein–ligand complexes permits structure-based featurization that is aware of the protein–ligand binding geometry.

Lipophilicity: Lipophilicity is an important feature of drug molecules that affects both membrane permeability and solubility. This dataset provides experimental results of octanol/water distribution coefficient (log D at pH 7.4) of 4200 compounds.

SIDER: The Side Effect Resource (SIDER) is a database of marketed drugs and adverse drug reactions (ADR).55 The version of the SIDER dataset in DeepChem56 has grouped drug side-effects into 27 system organ classes following MedDRA classifications57 measured for 1427 approved drugs.

Elizabeth-Joseph-Mawutin commented 2 years ago

Here is my suggested Twitter Template:

Saving lives and eradicating communicable diseases is our purpose. We have made another milestone progress in defeating (Malaria) with our latest released (Chemprop-antibiotic model) Ersilia believes that AI/ML can shorten the lengthy time for drug discovery, and we are glad that this model will bring us closer to that goal.

Thread

  • [ ] Why we decided to incorporate the model and who it will benefit
  • [ ] What is the model, and who are the contributors?
  • [ ] Reiterate how the model can influence drug discovery regarding such disease
  • [ ] Reassure our followers that we have more fantastic and life-changing models in the pipeline

Hello @Kcfreshly Nice job on your Twitter template.

Kcfreshly commented 2 years ago

Thank you @ElizabethMawutin

Kcfreshly commented 2 years ago

Here is the link to the card I created for our mission and vision statement

https://drive.google.com/file/d/1nAXHiAPV5V9Y7S09EkC4PulyHSy7WWxv/view?usp=sharing

loweyvana commented 2 years ago

I found these models interesting and will be glad if it can be incorporated into the Ersilia model

PDBbind: PDBbind is a comprehensive database of experimentally measured binding affinities for bio-molecular complexes.48,49 Unlike other ligand-based biological activity datasets, in which only the structures of ligands are provided, PDBbind provides detailed 3D Cartesian coordinates of both ligands and their target proteins derived from experimental (e.g., X-ray crystallography) measurements. The availability of coordinates of the protein–ligand complexes permits structure-based featurization that is aware of the protein–ligand binding geometry.

Lipophilicity: Lipophilicity is an important feature of drug molecules that affects both membrane permeability and solubility. This dataset provides experimental results of octanol/water distribution coefficient (log D at pH 7.4) of 4200 compounds.

SIDER: The Side Effect Resource (SIDER) is a database of marketed drugs and adverse drug reactions (ADR).55 The version of the SIDER dataset in DeepChem56 has grouped drug side-effects into 27 system organ classes following MedDRA classifications57 measured for 1427 approved drugs.

Hi @Kcfreshly. You have added a lot of work! well done!!! Nice job searching, but I think these are not actually models but datasets. From my little understanding, AI algorithms like Random Forests, for example, make use of these datasets to create/train a model. A good example of a model is DeepTox, based on Deep Learning and uses the Tox21 dataset.

GemmaTuron commented 2 years ago

Hi @Kcfreshly Thanks for the newsletter template, looks good! Some feedback on the technical card: we should try to make them shorter, summarizing description and algorithm, or people will not read everything. For the new models, indeed what you suggest are databases from which we could build models, not the final models themselves.

Focus on the final application now !

Kcfreshly commented 2 years ago

Thank you @loweyvana and our mentor @GemmaTuron for the feedback I will make the right changes

Kcfreshly commented 2 years ago

Three Models that can be useful to ersilia hub

Quantitative structure property relationship (QSPR) model helps in predicting the aqueous solubility of drugs which is validated by cross-validation methods. Aqueous solubility of a drug/drug candidate is essential data in drug discovery.

Laplacian Regularized Least Square algorithm (LRLSMDA) is proposed for identifying Microbe-Drug Associations. Predicting hidden microbe-drug associations can be helpful in understanding the microbe-drug association mechanisms in clinical treatment, drug discovery, combinations and repositioning

DrugPred_RNA A model for Structure-Based Druggability Predictions for RNA Binding Sites

Kcfreshly commented 2 years ago

Thank you @GemmaTuron I have made the changes according to your instructions, I am heading now to the final application. It has been an honor contributing on this novel platform

Kcfreshly commented 2 years ago

@GemmaTuron I made a tutorial video, I hope it serves right.

https://drive.google.com/file/d/1ufwLG7z_YFjHIkMpAtLM-9FJsEBIcGYb/view?usp=sharing

Pmaidoo commented 2 years ago

@Kcfreshly this is amazing. great work done !!!!

Kcfreshly commented 2 years ago

Thank you @Pmaidoo

GemmaTuron commented 2 years ago

Super video @Kcfreshly, very detailed! many thanks!

Kcfreshly commented 2 years ago

Thank you so much @GemmaTuron and @miquelduranfrigola for the guidance.