ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
209 stars 135 forks source link

Outreachy Documentation Project: Lowe Yvana #103

Closed loweyvana closed 2 years ago

loweyvana commented 2 years ago

Applicant: @loweyvana

Welcome to the Ersilia Open Source Initiative. This issue will serve to track all your contributions for the project “Improve the documentation and outreach material of the Ersilia Model Hub”.

Please tick the tasks as you complete them. To make a final application it is not required to have completed all tasks. Only the Initial Steps and Community sections are REQUIRED. The tasks are not ordered from more to less important, they are simply related to different skills. Start where you feel most comfortable. This project can be adapted to the applicants interests, please focus on the type of tasks that you prefer / have better skills / would like to work on as an intern.


Initial steps:

Feedback from mentor @miquelduranfrigola Miquel Duran (Ersilia) il y a 2 jours As for the series of blogposts idea, this is quite interesting. I think we could frame it within the context of the Ersilia Model Hub, for example: "What models should we prioritize, based on the current status of drug discovery/development for NTDs?

loweyvana commented 2 years ago

As a final year medical student who is very passionate about research, especially in the domain of infectious diseases, my ultimate goal is to combine medicine and artificial intelligence because they somewhere interact. How to do that I am not sure yet due to the fact that I am from a low-resource setting ,and that is exactly where Ersilia comes in. Ersilia perfectly blends my passion for medicine and technology and working on this project in any possible way would be an honor and a pleasure.

loweyvana commented 2 years ago

Hi @GemmaTuron this is a screenshot of a model installed on my computer. model hub demo model ersillia

GemmaTuron commented 2 years ago

Hi @loweyvana

Thanks for your interest. As you are a medical student, perhaps the sections related to scientific content are of interest to you and you could start working there!

loweyvana commented 2 years ago

Hi @loweyvana

Thanks for your interest. As you are a medical student, perhaps the sections related to scientific content are of interest to you and you could start working there!

Definitely! I'm currently working on it. Hoping to send a first draft soon.

loweyvana commented 2 years ago

Hi @GemmaTuron. Here is the link to the blog post I have written, summarising Ersilia's strategic plan for 2021-2023.https://docs.google.com/document/d/1-9fjOPDfYV_OqfD4Zt9ub4s5ddIOAwksswfOoXUCBmI/edit?usp=sharing

Please permit me to make some points ; -Unfortunately, I could not find ersilia's preferred typography "Beausite classic clear/ Beausite Classic SemiBold" in google docs so I went with Arial. Please how can I access it? -I tried as much as possible to respect the plum color. -I would have loved to add some images to make the blog post look more beautiful, more visual, and more engaging but we were limited to a single page.

GemmaTuron commented 2 years ago

Hi @loweyvana

Thanks for the work, could you give me access to the file as I cannot see it now? Do not worry about the font as it is not open, Arial works just fine for now

loweyvana commented 2 years ago

Hi @gemmaturon. I think you can access it now.

On Tue, 5 Apr 2022 at 07:38 gemmaturon @.***> wrote:

Hi @loweyvana https://github.com/loweyvana

Thanks for the work, could you give me access to the file as I cannot see it now? Do not worry about the font as it is not open, Arial works just fine for now

— Reply to this email directly, view it on GitHub https://github.com/ersilia-os/ersilia/issues/103#issuecomment-1088322144, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYOLMBCOJ2IYOOYGDPCFA6DVDPNWLANCNFSM5SNKPDUA . You are receiving this because you were mentioned.Message ID: @.***>

loweyvana commented 2 years ago

Hi @GemmaTuron. Good day. Here is the link to a slide explaining Ersilia's mission and vision. Since you earlier said the font is not open, I went with Glacial Indifference. https://raw.githubusercontent.com/loweyvana/ersilia/Slides/-videos/assets/Ersilia-mission_Lowe_Yvana.png I also tried as much as possible to make the slide soothing for the eyes while respecting the colors. I tried plum on mint but the color combination was not the best. Please let me know if you absolutely want those two colors. Lastly, I added the slide in the assets folder. Please can I add this contribution as a pull request?

loweyvana commented 2 years ago

Or this can be used. https://docs.google.com/document/d/1-9fjOPDfYV_OqfD4Zt9ub4s5ddIOAwksswfOoXUCBmI/edit?usp=sharing

loweyvana commented 2 years ago

Hi @GemmaTuron. Please find attached the link to a blog post I wrote on Neglected Tropical Diseases. I did not know which aspect of NTDs to focus on so I gave an overview. https://lowe-yvana.blogspot.com/2022/04/tropical-diseases-overview-according-to.html

I made use of blogger.com. Is that OK?

There is so much to write about NTDs, I suggest we write several blog posts about the different aspects of NTDs and add them as links to this main blog post. In that way, we could cover the whole topic. Looking forward to hearing from you.

loweyvana commented 2 years ago

Hi @GemmaTuron, @miquelduranfrigola Here is the link to the feedback on the blog post I wrote on Neglected Tropical Diseases.

loweyvana commented 2 years ago

Hi @GemmaTuron, @miquelduranfrigola Here is the link to the feedback on the blog post I wrote on Neglected Tropical Diseases.

https://outreachyersilia.slack.com/team/U03A88CKTLZ

loweyvana commented 2 years ago

HI @GemmaTuron, @miquelduranfrigola After searching scientific literature, I found these 3 models which I think could be incorporated in the hub.

-DeepChem: a machine learning model that uses a python-based AI system to find a suitable candidate in drug discovery.

-DeepTox: ML algorithm that predicts the toxicity of numerous molecules.

-Support Vector Machines.

-Random Forests.

Reference links -https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7577280/ -https://analyticsindiamag.com/top-6-ai-algorithms-in-healthcare/

loweyvana commented 2 years ago

Hi @GemmaTuron , @miquelduranfrigola Here is a sample of a technical card I have written on one of the models. I didn't know which format to use so I went with this. Waiting for your suggestions. Thank you.

TECHNICAL CARD ON CHEMPROP-ANTIBIOTIC

WHY THIS PROJECT? There has been a steady rise in the number of antibiotic-resistant bacteria in the past years, causing an estimated 700,000 deaths annually worldwide. If nothing is done, this figure is projected to be 10 million deaths/year by 2050. It is therefore of vital importance to discover new drugs to tackle this issue. Fortunately, scientific progress has made it possible for new molecules to be discovered using AI/ML. ChemProp-antibiotic was made for this purpose.

USE : The idea was to identify new antibiotic compounds which are structurally different from conventional antibiotic drugs.

ALGORITHM: The model makes use of the deep neural network (DNN) AI algorithm. DNN is an artificial neural network that mimics the transmission of electrical impulses in the human brain. DNN has two phases; training (learning) and inference (prediction). Here, this model was trained to predict molecules with antibacterial activity.

INPUT: The input is a compound called Halicin (SU3327), a molecule discovered at the Drug Repurposing Hub. This molecule was experimentally validated invivo and invitro. It is structurally different from other antibiotics and displays bactericidal activity against a wide spectrum of antibiotics such as Mycobacterium tuberculosis, Carbapenem-resistant Enterobacteriaceae, Clostridium difficille, and pan-resistant Acinetobacter baumannii.

DATA: Data was obtained from the ZINC15 database which has more than 107 million molecules.

OUTCOMES: So far, this model has successfully identified 08 antibacterial compounds that are structurally distant from known antibiotics. BENEFITS FOR USERS: The antibacterial compounds could be used to make new antibiotics with no immediate risk of antibiotic resistance.

loweyvana commented 2 years ago

Hi @GemmaTuron. Here are some screenshots of a docstring I wrote. docstring 1

docstring 3

adeolaadedeji commented 2 years ago

Hi @loweyvana ,i was going through your blogpost and i noticed that the EOSI link in the measurements of success and expectation column was not linked to a website.

ifeoluwafavour commented 2 years ago

Wow @loweyvana you are doing a great work! Well done 💪

GemmaTuron commented 2 years ago

Hi @loweyvana

Amazing work. I will try to answer and provide feedback one by one:

GemmaTuron commented 2 years ago

HI @GemmaTuron, @miquelduranfrigola After searching scientific literature, I found these 3 models which I think could be incorporated in the hub.

-DeepChem: a machine learning model that uses a python-based AI system to find a suitable candidate in drug discovery.

-DeepTox: ML algorithm that predicts the toxicity of numerous molecules.

-Support Vector Machines.

-Random Forests.

Reference links -https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7577280/ -https://analyticsindiamag.com/top-6-ai-algorithms-in-healthcare/

For this models, DeepChem and DeepTox are indeed in our list of next to incorporate, so good job in finding them! SVM and RF are techniques that can be used to create the models, not a model per se (you need to give data to a Random Forest and use this technique to train the actual model). Can I suggest as a follow up task to write the cards for these two models you have identified so that when we incorporate them we already have the information? that would be very helpful

GemmaTuron commented 2 years ago

Hi @GemmaTuron , @miquelduranfrigola Here is a sample of a technical card I have written on one of the models. I didn't know which format to use so I went with this. Waiting for your suggestions. Thank you.

TECHNICAL CARD ON CHEMPROP-ANTIBIOTIC

WHY THIS PROJECT? There has been a steady rise in the number of antibiotic-resistant bacteria in the past years, causing an estimated 700,000 deaths annually worldwide. If nothing is done, this figure is projected to be 10 million deaths/year by 2050. It is therefore of vital importance to discover new drugs to tackle this issue. Fortunately, scientific progress has made it possible for new molecules to be discovered using AI/ML. ChemProp-antibiotic was made for this purpose.

USE : The idea was to identify new antibiotic compounds which are structurally different from conventional antibiotic drugs.

ALGORITHM: The model makes use of the deep neural network (DNN) AI algorithm. DNN is an artificial neural network that mimics the transmission of electrical impulses in the human brain. DNN has two phases; training (learning) and inference (prediction). Here, this model was trained to predict molecules with antibacterial activity.

INPUT: The input is a compound called Halicin (SU3327), a molecule discovered at the Drug Repurposing Hub. This molecule was experimentally validated invivo and invitro. It is structurally different from other antibiotics and displays bactericidal activity against a wide spectrum of antibiotics such as Mycobacterium tuberculosis, Carbapenem-resistant Enterobacteriaceae, Clostridium difficille, and pan-resistant Acinetobacter baumannii.

DATA: Data was obtained from the ZINC15 database which has more than 107 million molecules.

OUTCOMES: So far, this model has successfully identified 08 antibacterial compounds that are structurally distant from known antibiotics. BENEFITS FOR USERS: The antibacterial compounds could be used to make new antibiotics with no immediate risk of antibiotic resistance.

Good work @loweyvana ! See my suggestion to do the same with the models you identified in the literature. If there are some technical details in the papers you do not understand ping us here or in Slack for clarification!

loweyvana commented 2 years ago

Hi @loweyvana ,i was going through your blogpost and i noticed that the EOSI link in the measurements of success and expectation column was not linked to a website.

Hi @adeolaadedeji . Thank you so much for pointing that out. Let me rectify that.

loweyvana commented 2 years ago

Wow @loweyvana you are doing a great work! Well done 💪

Hi @ifeoluwafavour. Thank you so much. You are equally doing a great job.

loweyvana commented 2 years ago

Hi @loweyvana

Amazing work. I will try to answer and provide feedback one by one:

  • Strategic Plan blogpost: I like the table summary, it's perhaps not the best format for a blogpost (where people want to read text) but is a good way of summarizing. Just make sure you don't use an acronym if it has not been referenced before. Ie. Ersilia Open Source Initiative (EOSI) is ... then you can simply use EOSI downstream. MoU (memorandum of understanding) needs to be written, people might not now what it is
  • Vision and Mission: looks really nice! My only comment for future work would be perhaps to not only images of brain, heart etc but also representations of bacteria, virus... since we work with infectious diseases
  • Neglected diseases post: nice research on the topic!

Hi @GemmaTuron. Thank you so much for the feedback. I have rectified the acronym issue. For the graphic ; I used the organs to represent the possible targets of these bacteria/viruses. However, I agree with you since this might not be understood. I would change that and send the new slide. Thank you @GemmaTuron

loweyvana commented 2 years ago

HI @GemmaTuron, @miquelduranfrigola After searching scientific literature, I found these 3 models which I think could be incorporated in the hub. -DeepChem: a machine learning model that uses a python-based AI system to find a suitable candidate in drug discovery. -DeepTox: ML algorithm that predicts the toxicity of numerous molecules. -Support Vector Machines. -Random Forests. Reference links -https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7577280/ -https://analyticsindiamag.com/top-6-ai-algorithms-in-healthcare/

For this models, DeepChem and DeepTox are indeed in our list of next to incorporate, so good job in finding them! SVM and RF are techniques that can be used to create the models, not a model per se (you need to give data to a Random Forest and use this technique to train the actual model). Can I suggest as a follow up task to write the cards for these two models you have identified so that when we incorporate them we already have the information? that would be very helpful

Oh wow!!! Thanks for the clarification on the difference between a model and an AI technique. I must admit at one point in time I got mixed up when researching. For the follow-up tasks you can count it done. I would just want to find out which format you'd like me to use.

loweyvana commented 2 years ago

Hi @loweyvana

Amazing work. I will try to answer and provide feedback one by one:

  • Strategic Plan blogpost: I like the table summary, it's perhaps not the best format for a blogpost (where people want to read text) but is a good way of summarizing. Just make sure you don't use an acronym if it has not been referenced before. Ie. Ersilia Open Source Initiative (EOSI) is ... then you can simply use EOSI downstream. MoU (memorandum of understanding) needs to be written, people might not now what it is
  • Vision and Mission: looks really nice! My only comment for future work would be perhaps to not only images of brain, heart etc but also representations of bacteria, virus... since we work with infectious diseases
  • Neglected diseases post: nice research on the topic!

link to the corrected blogpost on Ersilia's strategic plan. https://docs.google.com/document/d/1-9fjOPDfYV_OqfD4Zt9ub4s5ddIOAwksswfOoXUCBmI/edit?usp=sharing

loweyvana commented 2 years ago

Template for Twitter.

MAIN POST Yaaaayy! We have a new model in the hub!!! In our latest Twitter post, we are sharing with you all you need to know.

THREAD -Name of model/ Date of incorporation into the hub. (1/3) eg Our new model DeepChem was added on 05/04/2021. -Brief description of how it works. (2/3) eg This model works by.... -Name of contributors who worked on the project and more info on how to go about it( 3/3). eg This contribution was made by X. If you like our work and want to contribute, Please check our repository for more information.

Kcfreshly commented 2 years ago

Amazing work @loweyvana with top pace. I recommend you channel your writing in strategic planning to look more like a blog post. Top notch work so far,

loweyvana commented 2 years ago

Amazing work @loweyvana with top pace. I recommend you channel your writing in strategic planning to look more like a blog post. Top notch work so far,

Thank you @Kcfreshly for your comment. Please @GemmaTuron is that ok?

loweyvana commented 2 years ago

Hi @GemmaTuron. Here is the link to my video tutorial PR on how to use the model hub. #222

GemmaTuron commented 2 years ago

Hi @loweyvana

Your idea of doing a summary table is good, @Kcfreshly thanks for pointing this out, I had in mind a more "written" post style but I like the table format also, so let's leave it like this. I'm looking forward to seeing the video! Can you add it somewhere, like google drive, and provide an open link? These videos won't go on the github repo, which would become too full

I think with the video, and the model cards for the new models you are ready for a final application to the project, great job

loweyvana commented 2 years ago

Well received! @GemmaTuron. Thank you so much.

loweyvana commented 2 years ago

https://drive.google.com/file/d/1WVQ5X3zTcPnTk5LCminKqMotEw4ERnXs/view?usp=sharing Here is the link to the video showing how to use the Ersilia model hub.

loweyvana commented 2 years ago

Hi @loweyvana

Your idea of doing a summary table is good, @Kcfreshly thanks for pointing this out, I had in mind a more "written" post style but I like the table format also, so let's leave it like this. I'm looking forward to seeing the video! Can you add it somewhere, like google drive, and provide an open link? These videos won't go on the GitHub repo, which would become too full.

I think with the video, and the model cards for the new models you are ready for a final application to the project, great job

Was about to send the newsletter template! Worked on it this morning. Let me send that and the cards on the new models. Once again, thanks so much for everything. I am discovering a new passion!

loweyvana commented 2 years ago

Hi @GemmaTuron Link to the newsletter template. https://docs.google.com/document/d/1hFuBpVesfZ-CuZqiyT0cx4tbEkB2vhZgl5DdTRByDNI/edit?usp=sharing

loweyvana commented 2 years ago

TECHNICAL CARD ON DEEPTOX

DESCRIPTION: DeepTox is a pipeline used to predict the toxicity of new compounds. Firstly, it analyzes the chemical representations of compounds, then it computes a large number of chemical descriptors that are used as input to machine learning methods. AI/ML Algorithm used to train the model: DeepTox makes use of Deep Learning for toxicity prediction. Another valuable tool is the Kernel-based structural and pharmacological analoging, KSPA (a new method that improves predictions by exploiting information that is available in public databases).

DATA USED: Data is obtained from a pre-processed Tox21 Dataset which is available at http://bioinf.jku.at/research/DeepTox/tox21.html.

INPUT : Compound

OUTPUT: Drug toxicity

PUBLICATION: https://www.frontiersin.org/articles/10.3389/fenvs.2015.00080/full

loweyvana commented 2 years ago

TECHNICAL CARD ON DEEPCHEM

DESCRIPTION: DeepChem is an Open Source Machine Learning framework that aims at creating high-quality; open-source tools for drug discovery; material science, quantum chemistry, and genomics. DeepChem supports a broad range of ML frameworks such as PyTorch, Chemception, TorchModel, Scikit-learn models, KerasModel, weavemodel, Deep Tensor Neural Network and so much more.

USES: DeepChem is a machine learning library, so it gives you the tools which could help

DATASET: Various sources of data are used, depending on what you wish to do. Some examples are the Delaney solubility dataset, Tox21 dataset, chEMBL, and BACE datasets.

INPUT: varies OUTPUT: varies

loweyvana commented 2 years ago

Hi @GemmaTuron. Here are some cards on the models DeepTox and DeepChem. I had a worry with DeepChem. I noticed it is not a model per se, but it's actually an open-source framework like ersilia, with lots of other models in it that scientists can make use of. Please what is the best thing to do?

tracynuwagaba commented 2 years ago

Great work @loweyvana for the well detailed tutorial

loweyvana commented 2 years ago

Great work @loweyvana for the well detailed tutorial

Thank you @tracycod3r.

wintermornings00 commented 2 years ago

The tutorial is really informative @loweyvana . Great job!

loweyvana commented 2 years ago

The tutorial is really informative @loweyvana . Great job!

Thankk you very much @wintermornings00

Jaya3112 commented 2 years ago

@loweyvana you are doing amazing!

Amna-28 commented 2 years ago

Hi @loweyvana , I was going through your work and I must say you are doing a great job. The video tutorial is Amazing.

loweyvana commented 2 years ago

@loweyvana you are doing amazing!

Thank you so much Jaya!

loweyvana commented 2 years ago

Hi @loweyvana , I was going through your work and I must say you are doing a great job. The video tutorial is Amazing.

Thank youuuu @Amna-28. I'm glad you found it helpful!

GemmaTuron commented 2 years ago

Hi @loweyvana

The video is very clear thank you so much! For the DeepChem, it is a larger library than just one model, this is why it looks a bit different from the others. For the moment, these larger libraries are out of scope for our work, so don't worry about the card

loweyvana commented 2 years ago

Hi @loweyvana

The video is very clear thank you so much! For the DeepChem, it is a larger library than just one model, this is why it looks a bit different from the others. For the moment, these larger libraries are out of scope for our work, so don't worry about the card

Thank you @GemmaTuron for reviewing my work. Working on my application now. Is that ok?