Outreachy Documentation Project: <@Pmaidoo>

Pmaidoo commented 2 years ago

Applicant: <@Pmaidoo>

Welcome to the Ersilia Open Source Initiative. This issue will serve to track all your contributions for the project “Improve the documentation and outreach material of the Ersilia Model Hub”.

Please tick the tasks as you complete them. To make a final application it is not required to have completed all tasks. Only the Initial Steps and Community sections are REQUIRED. The tasks are not ordered from more to less important, they are simply related to different skills. Start where you feel most comfortable. This project can be adapted to the applicants interests, please focus on the type of tasks that you prefer / have better skills / would like to work on as an intern.

Initial steps:

[x] Record your application for the project in the outreachy website referencing this issue. Please make sure to select the right project on the website.
[x] Join the Slack channel to follow public communications
[x] Comment under this issue explaining why are you interested in this project

GitHub documentation:
[ ] Create a README file with the name under the /documentation folder
[ ] Link the #PR in a comment under this issue
[ ] Incorporate feedback from the mentor

Writing dissemination material
[x] Read the Strategic plan 2021-2023 for Ersilia and create a 1-page blogpost with the main points
[x] Comment under this issue with a link to the blogpost (a google docs for example)
[x] Incorporate feedback from the mentor
[x] Choose your own topic related to Ersilia (AI/ML for biomedical research, neglected diseases, drug discovery…) and write a 1-page blogpost to communicate to a non-expert audience
[x] Comment under this issue with a link to the blogpost (a google docs for example)
[x] Incorporate feedback from the mentor
[x] Create a template for a twitter post to release every time a new model is incorporated in the Hub (twitter: 280 characters, you can suggest a main post + thread with extra information) and add it as a comment under this issue
[x] Create a template short Newsletter (1 paragraph) to send every month to our community (funders, users, contributors). It should mention metrics (models in the hub, number of users, funding…), thank you etc

Technical skills (required for the tutorial only)
[ ] Install the Ersilia Model Hub
[ ] Test one model
[ ] Add a screenshot under this issue showing the model running in your computer
[ ] Write a docstring for the ErsiliaModel class. Use the Google Python Style guide. Paste the docstring as a comment below (do not use a PR).

Graphic material
[x] Read the Ersilia Brand Guidelines
[x] Read “Why Ersilia?”
[x] Create one image / slide to explain Ersilia’s mission and vision
[x] Link to the image/slide as a comment under this issue
[x] Incorporate feedback from the mentor
[ ] Create two slides / short video showing how to use the Ersilia Model Hub and add them under the /tutorial folder
[ ] Link the #PR in this issue
[ ] Incorporate feedback from the mentor

Scientific content
[x] Check the models available in the Hub
[x] Select one model from the list and write a technical card (what is the model for, what input, which data was used to create it, what kind of ML algorithm uses…) for it
[x] Add your card as a comment to this issue
[x] Search the scientific literature and suggest 3 new models (comment in this issue) that would be relevant to incorporate in the Hub.

Community
[x] Look up two other projects and comment on their issues with feedback on one of their tasks
[x] If you have feedback from your peers, answer it in this issue.

Other

If you have interest in working on related topics, or have new suggestions, please do the following
[x] Add a comment in this issue with your new idea, tagging the mentor
[x] Get feedback from the mentor and act accordingly
[x] Link in the comments any other PR you have contributed to.

Final application
[x] I have answered all comments from mentors and contributors
[x] All PR or issues assigned to me are complete
[x] I have submitted my final application to the project

Pmaidoo commented 2 years ago

My name is Priscilla Maria Aidoo from Ghana , Africa. I am an Actuarial Science graduate which a passion for finance and technology. I am an in coming data scientist and analyst who enjoys working on data to make meaning out of it and to help in decision making. After consecutive reading and research about the ersilia project i find it as one project that deem fit me as a person. As an individual, helping people in any was possible is one thing i take so much delight in . I like the concept of using artificial intelligence and machine learning to make medicine for infectious diseases. I believe that this is a way of of technology to make very useful decision for people who are suffering from infectious diseases. Artificial Intelligence and Machine learning have the power to solve these concerning issues. The future of this research work is bright and I want to be part of this revolutionary change. This is the reason why I want to contribute to this project. I can relate to the problems that people suffering from these diseases face all over the world especially my country. If my contribution can help someone in reducing their pain, I would surely like to offer that help since in the end, humanity is our utmost accomplishment. @GemmaTuron

GemmaTuron commented 2 years ago

Hi @Pmaidoo Can you please add your name in the issue title, and then continue working from there? Thanks!

Pmaidoo commented 2 years ago

Hello @GemmaTuron Kindly find attached a link to my blog post and let me know the changes needed to be made, https://docs.google.com/document/d/1MZHRs7lL-iPGB1DZM-oKjMJInPSkkgnQRehyMAK0VD8/edit#

Pmaidoo commented 2 years ago

hi @GemmaTuron and @miquelduranfrigola here is my technical card on one model from the hub

Malaria inhibitor prediction TECHNICAL CARD USE : The malaria inhibitor prediction (MAIP) platform is the result of a public-private collaboration whose aim is to develop a consensus model for predicting blood stage malaria inhibition. Five Pharma and not-for-profit partners trained a model (using code developed by EMBL-EBI) on their private datasets. The resulting models were combined by EMBL-EBI and made available through this public prediction platform.

ALGORITHM: t-distributed Stochastic Neighbor Embedding (t-SNE) is an algorithm performing a nonlinear dimensionality reduction and designed for data visualization. The chemical space for the three validation sets was derived from the t-SNE calculation using the same fingerprint descriptors as for model . The resulting sparse matrix corresponding to the chemical features present in the validation set compounds was used as input for scikit-learn’s implementation of the t-SNE algorithm using a perplexity value of 500.

DATA: Eleven datasets from five different partners were used in this study to train models. The Evotec, Johns Hopkins, MRCT, MMV - St. Jude, AZ, GSK, and St. Jude Vendor Library datasets were essentially used. The Medicines for Malaria Venture (MMV) partner provided three additional datasets to be used for training models and the Novartis dataset was used as well OUTCOMES: Our first goal was to assess the ability of our new software methods and code to reproduce the previous study. We determined that the major difference would be due to implementations of the descriptor calculations as the distributions of calculated physico-chemical properties are reasonably well (but not perfectly) correlated . To explore the impact of differences in fingerprint implementations on model-building and performance we used the MMV. A pairwise comparison using the Pearson correlation coefficient (R) for the two sets of scores gave a value of 0.88, indicating a good but not perfect correlation (Additional file 1: Figure S2A). In a second comparison, we used ECFP6 fingerprints and RDKit Morgan fingerprints with radius of 3 but without features. This gave an R coefficient of 0.98, indicating almost perfect identity between the two model implementations.

BENEFITS FOR USERS: The result of a public-private collaboration helps to develop a consensus model for predicting blood stage malaria inhibition

Pmaidoo commented 2 years ago

@GemmaTuron and @miquelduranfrigola here are 3 new models that would be relevant to incorporate in the Hub. -DeepChem: a machine learning model that uses a python-based AI system to find a suitable candidate in drug discovery.

-Support Vector Machines.

-DeepTox: ML algorithm that predicts the toxicity of numerous molecules.

Pmaidoo commented 2 years ago

hi @GemmaTuron kindly find attached a link to my blog post on own topic related to Ersilia (AI/ML for biomedical research)

https://docs.google.com/document/d/1h4cf8-cbsbTd1oojPc4UGsiko0IJpVZwHVRleleay4A/edit

GemmaTuron commented 2 years ago

Hello @GemmaTuron Kindly find attached a link to my blog post and let me know the changes needed to be made, https://docs.google.com/document/d/1MZHRs7lL-iPGB1DZM-oKjMJInPSkkgnQRehyMAK0VD8/edit#

I @Pmaidoo I can't open it!

GemmaTuron commented 2 years ago

@GemmaTuron and @miquelduranfrigola here are 3 new models that would be relevant to incorporate in the Hub.

Hi @Pmaidoo These have already been suggested, can you dig a little more to find some other models? thanks

GemmaTuron commented 2 years ago

hi @GemmaTuron and @miquelduranfrigola here is my technical card on one model from the hub

Malaria inhibitor prediction TECHNICAL CARD USE : The malaria inhibitor prediction (MAIP) platform is the result of a public-private collaboration whose aim is to develop a consensus model for predicting blood stage malaria inhibition. Five Pharma and not-for-profit partners trained a model (using code developed by EMBL-EBI) on their private datasets. The resulting models were combined by EMBL-EBI and made available through this public prediction platform.

ALGORITHM: t-distributed Stochastic Neighbor Embedding (t-SNE) is an algorithm performing a nonlinear dimensionality reduction and designed for data visualization. The chemical space for the three validation sets was derived from the t-SNE calculation using the same fingerprint descriptors as for model . The resulting sparse matrix corresponding to the chemical features present in the validation set compounds was used as input for scikit-learn’s implementation of the t-SNE algorithm using a perplexity value of 500.

DATA: Eleven datasets from five different partners were used in this study to train models. The Evotec, Johns Hopkins, MRCT, MMV - St. Jude, AZ, GSK, and St. Jude Vendor Library datasets were essentially used. The Medicines for Malaria Venture (MMV) partner provided three additional datasets to be used for training models and the Novartis dataset was used as well OUTCOMES: Our first goal was to assess the ability of our new software methods and code to reproduce the previous study. We determined that the major difference would be due to implementations of the descriptor calculations as the distributions of calculated physico-chemical properties are reasonably well (but not perfectly) correlated . To explore the impact of differences in fingerprint implementations on model-building and performance we used the MMV. A pairwise comparison using the Pearson correlation coefficient (R) for the two sets of scores gave a value of 0.88, indicating a good but not perfect correlation (Additional file 1: Figure S2A). In a second comparison, we used ECFP6 fingerprints and RDKit Morgan fingerprints with radius of 3 but without features. This gave an R coefficient of 0.98, indicating almost perfect identity between the two model implementations.

BENEFITS FOR USERS: The result of a public-private collaboration helps to develop a consensus model for predicting blood stage malaria inhibition

Hi @Pmaidoo I think all information is there but it is a bit too long, and people wont read through it all. Can you make it shorter and perhaps with a few more sections with only one or two lines under it?

Pmaidoo commented 2 years ago

@GemmaTuron i have made it visible now. kindly check them and let me know https://docs.google.com/document/d/1MZHRs7lL-iPGB1DZM-oKjMJInPSkkgnQRehyMAK0VD8/edit?usp=sharing

https://docs.google.com/document/d/1h4cf8-cbsbTd1oojPc4UGsiko0IJpVZwHVRleleay4A/edit?usp=sharing

Pmaidoo commented 2 years ago

hi @GemmaTuron here is my technical card on one model from the hub after making all the correction you made mention of. its shorter now.

Malaria inhibitor prediction TECHNICAL CARD USE : The malaria inhibitor prediction (MAIP) platform is the result of a public-private collaboration whose aim is to develop a consensus model for predicting blood stage malaria inhibition. Five Pharma and not-for-profit partners trained a model on their private datasets. The resulting models were combined by EMBL-EBI and made available through this public prediction platform.

ALGORITHM: t-distributed Stochastic Neighbor Embedding (t-SNE) is an algorithm performing a nonlinear dimensionality reduction and designed for data visualization. The resulting sparse matrix corresponding to the chemical features present in the validation set compounds was used as input for scikit-learn’s implementation of the t-SNE algorithm using a perplexity value of 500.

DATA: Eleven datasets from five different partners were used in this study to train models. The Evotec, Johns Hopkins, MRCT, MMV - St. Jude, AZ, GSK, and St. Jude Vendor Library datasets were essentially used. The Medicines for Malaria Venture partner provided three additional datasets to be used for training models and the Novartis dataset was used as well

OUTCOMES: Our first goal was to assess the ability of our new software methods and code to reproduce the previous study. We determined that the major difference would be due to implementations of the descriptor calculations as the distributions of calculated physico-chemical properties are reasonably well (but not perfectly) correlated . A pairwise comparison using the Pearson correlation coefficient (R) for the two sets of scores gave a value of 0.88, indicating a good but not perfect correlation.

BENEFITS FOR USERS: The result of a public-private collaboration helps to develop a consensus model for predicting blood stage malaria inhibition

Pmaidoo commented 2 years ago

hi @GemmaTuron here is my final technical card on one model from the hub after making the corrections you made mention of , its shorter and concise now. Awaiting your critiques if any.

Malaria inhibitor prediction TECHNICAL CARD USE : The malaria inhibitor prediction (MAIP) platform is the result of a public-private collaboration whose aim is to develop a consensus model for predicting blood stage malaria inhibition.

ALGORITHM: t-distributed Stochastic Neighbor Embedding (t-SNE) is an algorithm performing a nonlinear dimensionality reduction and designed for data visualization. The resulting sparse matrix corresponding to the chemical features present in the validation set compounds was used as input for scikit-learn’s implementation of the t-SNE algorithm using a perplexity value of 500.

DATA: The Evotec, Johns Hopkins, MRCT, MMV - St. Jude, AZ, GSK, and St. Jude Vendor Library datasets were essentially used. The Medicines for Malaria Venture partner provided three additional datasets to be used for training models and the Novartis dataset was used as well

OUTCOMES: Our first goal was to assess the ability of our new software methods and code to reproduce the previous study . A pairwise comparison using the Pearson correlation coefficient (R) for the two sets of scores gave a value of 0.88, indicating a good but not perfect correlation.

BENEFITS FOR USERS: The result of a public-private collaboration helps to develop a consensus model for predicting blood stage malaria inhibition

Pmaidoo commented 2 years ago

@GemmaTuron

Template for Twitter.

MAIN POST Hey there! We have a new model in the hub! we are sharing with you all you need to know since we are in this together! 1/3

THREAD -Name of model/ Date of incorporation into the hub and brief description of how it works. (2/3) -How the model works and more info on how to go about it( 3/3).

Pmaidoo commented 2 years ago

@GemmaTuron here is my template short Newsletter

Yaay! Excited to have you here We’ve got some interesting news for you. We have introduced (number of models) models into the Ersilia model hub. They are__ (Give the names of the models). For previous months we raised a fund of about (amount of funds) and we presently have (number) users. Currently have about__ (number ) of volunteers working with us . Because of this there is a significant increase in production daily. For more information on the models, check out the documentation on our main website (link to the website). Thank you! Have a great day ahead.

Pmaidoo commented 2 years ago

@GemmaTuron please find attached the link to my google doc with the ersilia image showing its vission and mission

https://docs.google.com/document/d/1kgbWjhx0WFef2Lh2lJXV16IRd1SQcn64n3fkM1obglM/edit?usp=sharing

Pmaidoo commented 2 years ago

@GemmaTuron About the community task above, i dont seem to understand the question well . which exact project does it speak of ? "Look up two other projects and comment on their issues with feedback on one of their tasks"

Pmaidoo commented 2 years ago

please find attached the link to my blog post on own topic related to Ersilia (AI/ML for biomedical research) https://docs.google.com/document/d/1h4cf8-cbsbTd1oojPc4UGsiko0IJpVZwHVRleleay4A/edit?usp=sharing

tracynuwagaba commented 2 years ago

Hi @Pmaidoo, I have noticed a slight spelling mistake in the google doc showing ersilia's vision and mission. You wrote healthcae in the vision instead of healthcare.

Pmaidoo commented 2 years ago

@tracycod3r Thank you for the correction i will check it

Kcfreshly commented 2 years ago

Well done @Pmaidoo

GemmaTuron commented 2 years ago

Hi @Pmaidoo

Thanks for rewriting the blogpost. You have done a lot of work in the project so I think now you should focus on preparing your final application in the outreachy website, many thanks!

srcmilena commented 2 years ago

Hello, @Pmaidoo!

I read your first task and I also like how the artificial intelligence can solve so many issues and I loved to know how Ersilia apply this in their concept. Your document file name as "All about Ersilia" is also very helpful. Congrats for that!

Good work.

Pmaidoo commented 2 years ago

@srcmilena thank you very much . alright i will write you on the slack if possible

srcmilena commented 2 years ago

@Pmaidoo of course! you are welcome. i'll be waiting you on slack! see ya ☺️

DokuaAsiedu commented 2 years ago

@Pmaidoo loved reading your blog post. It was very concise, straight to the point and conveyed the main points of the article. The addition of a table condenses the information and gives you everything you need to read. Well done!

Pmaidoo commented 2 years ago

@GemmaTuron i had wanted to create a readme file and test a model after installation but i was unable to even after lots of help from my colleagues. some steps and guard lines are not so detailed for a beginner and i would like to suggest that care is taking to help curb that and also make it easier for any beginner in order to save time

ersilia-os / ersilia

Outreachy Documentation Project: <@Pmaidoo> #162

Initial steps:

[x] Comment under this issue explaining why are you interested in this project

GitHub documentation:

[ ] Incorporate feedback from the mentor

Writing dissemination material

[x] Create a template short Newsletter (1 paragraph) to send every month to our community (funders, users, contributors). It should mention metrics (models in the hub, number of users, funding…), thank you etc

Technical skills (required for the tutorial only)

[ ] Write a docstring for the ErsiliaModel class. Use the Google Python Style guide. Paste the docstring as a comment below (do not use a PR).

Graphic material

[ ] Incorporate feedback from the mentor

Scientific content

[x] Search the scientific literature and suggest 3 new models (comment in this issue) that would be relevant to incorporate in the Hub.

Community

[x] If you have feedback from your peers, answer it in this issue.

Other

[x] Link in the comments any other PR you have contributed to.

Final application