ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
189 stars 123 forks source link

Outreachy Documentation Project: <@Pmaidoo> #162

Closed Pmaidoo closed 2 years ago

Pmaidoo commented 2 years ago

Applicant: <@Pmaidoo>

Welcome to the Ersilia Open Source Initiative. This issue will serve to track all your contributions for the project “Improve the documentation and outreach material of the Ersilia Model Hub”.

Please tick the tasks as you complete them. To make a final application it is not required to have completed all tasks. Only the Initial Steps and Community sections are REQUIRED. The tasks are not ordered from more to less important, they are simply related to different skills. Start where you feel most comfortable. This project can be adapted to the applicants interests, please focus on the type of tasks that you prefer / have better skills / would like to work on as an intern.


Initial steps:

Pmaidoo commented 2 years ago

My name is Priscilla Maria Aidoo from Ghana , Africa. I am an Actuarial Science graduate which a passion for finance and technology. I am an in coming data scientist and analyst who enjoys working on data to make meaning out of it and to help in decision making. After consecutive reading and research about the ersilia project i find it as one project that deem fit me as a person. As an individual, helping people in any was possible is one thing i take so much delight in . I like the concept of using artificial intelligence and machine learning to make medicine for infectious diseases. I believe that this is a way of of technology to make very useful decision for people who are suffering from infectious diseases. Artificial Intelligence and Machine learning have the power to solve these concerning issues. The future of this research work is bright and I want to be part of this revolutionary change. This is the reason why I want to contribute to this project. I can relate to the problems that people suffering from these diseases face all over the world especially my country. If my contribution can help someone in reducing their pain, I would surely like to offer that help since in the end, humanity is our utmost accomplishment. @GemmaTuron

GemmaTuron commented 2 years ago

Hi @Pmaidoo Can you please add your name in the issue title, and then continue working from there? Thanks!

Pmaidoo commented 2 years ago

Hello @GemmaTuron Kindly find attached a link to my blog post and let me know the changes needed to be made, https://docs.google.com/document/d/1MZHRs7lL-iPGB1DZM-oKjMJInPSkkgnQRehyMAK0VD8/edit#

Pmaidoo commented 2 years ago

hi @GemmaTuron and @miquelduranfrigola here is my technical card on one model from the hub

Malaria inhibitor prediction TECHNICAL CARD USE : The malaria inhibitor prediction (MAIP) platform is the result of a public-private collaboration whose aim is to develop a consensus model for predicting blood stage malaria inhibition. Five Pharma and not-for-profit partners trained a model (using code developed by EMBL-EBI) on their private datasets. The resulting models were combined by EMBL-EBI and made available through this public prediction platform.

ALGORITHM: t-distributed Stochastic Neighbor Embedding (t-SNE) is an algorithm performing a nonlinear dimensionality reduction and designed for data visualization. The chemical space for the three validation sets was derived from the t-SNE calculation using the same fingerprint descriptors as for model . The resulting sparse matrix corresponding to the chemical features present in the validation set compounds was used as input for scikit-learn’s implementation of the t-SNE algorithm using a perplexity value of 500.

DATA: Eleven datasets from five different partners were used in this study to train models. The Evotec, Johns Hopkins, MRCT, MMV - St. Jude, AZ, GSK, and St. Jude Vendor Library datasets were essentially used. The Medicines for Malaria Venture (MMV) partner provided three additional datasets to be used for training models and the Novartis dataset was used as well OUTCOMES: Our first goal was to assess the ability of our new software methods and code to reproduce the previous study. We determined that the major difference would be due to implementations of the descriptor calculations as the distributions of calculated physico-chemical properties are reasonably well (but not perfectly) correlated . To explore the impact of differences in fingerprint implementations on model-building and performance we used the MMV. A pairwise comparison using the Pearson correlation coefficient (R) for the two sets of scores gave a value of 0.88, indicating a good but not perfect correlation (Additional file 1: Figure S2A). In a second comparison, we used ECFP6 fingerprints and RDKit Morgan fingerprints with radius of 3 but without features. This gave an R coefficient of 0.98, indicating almost perfect identity between the two model implementations.

BENEFITS FOR USERS: The result of a public-private collaboration helps to develop a consensus model for predicting blood stage malaria inhibition

Pmaidoo commented 2 years ago

@GemmaTuron and @miquelduranfrigola here are 3 new models that would be relevant to incorporate in the Hub. -DeepChem: a machine learning model that uses a python-based AI system to find a suitable candidate in drug discovery.

-Support Vector Machines.

-DeepTox: ML algorithm that predicts the toxicity of numerous molecules.

Pmaidoo commented 2 years ago

hi @GemmaTuron kindly find attached a link to my blog post on own topic related to Ersilia (AI/ML for biomedical research)

https://docs.google.com/document/d/1h4cf8-cbsbTd1oojPc4UGsiko0IJpVZwHVRleleay4A/edit

GemmaTuron commented 2 years ago

Hello @GemmaTuron Kindly find attached a link to my blog post and let me know the changes needed to be made, https://docs.google.com/document/d/1MZHRs7lL-iPGB1DZM-oKjMJInPSkkgnQRehyMAK0VD8/edit#

I @Pmaidoo I can't open it!

GemmaTuron commented 2 years ago

@GemmaTuron and @miquelduranfrigola here are 3 new models that would be relevant to incorporate in the Hub.

Hi @Pmaidoo These have already been suggested, can you dig a little more to find some other models? thanks

GemmaTuron commented 2 years ago

hi @GemmaTuron and @miquelduranfrigola here is my technical card on one model from the hub

Malaria inhibitor prediction TECHNICAL CARD USE : The malaria inhibitor prediction (MAIP) platform is the result of a public-private collaboration whose aim is to develop a consensus model for predicting blood stage malaria inhibition. Five Pharma and not-for-profit partners trained a model (using code developed by EMBL-EBI) on their private datasets. The resulting models were combined by EMBL-EBI and made available through this public prediction platform.

ALGORITHM: t-distributed Stochastic Neighbor Embedding (t-SNE) is an algorithm performing a nonlinear dimensionality reduction and designed for data visualization. The chemical space for the three validation sets was derived from the t-SNE calculation using the same fingerprint descriptors as for model . The resulting sparse matrix corresponding to the chemical features present in the validation set compounds was used as input for scikit-learn’s implementation of the t-SNE algorithm using a perplexity value of 500.

DATA: Eleven datasets from five different partners were used in this study to train models. The Evotec, Johns Hopkins, MRCT, MMV - St. Jude, AZ, GSK, and St. Jude Vendor Library datasets were essentially used. The Medicines for Malaria Venture (MMV) partner provided three additional datasets to be used for training models and the Novartis dataset was used as well OUTCOMES: Our first goal was to assess the ability of our new software methods and code to reproduce the previous study. We determined that the major difference would be due to implementations of the descriptor calculations as the distributions of calculated physico-chemical properties are reasonably well (but not perfectly) correlated . To explore the impact of differences in fingerprint implementations on model-building and performance we used the MMV. A pairwise comparison using the Pearson correlation coefficient (R) for the two sets of scores gave a value of 0.88, indicating a good but not perfect correlation (Additional file 1: Figure S2A). In a second comparison, we used ECFP6 fingerprints and RDKit Morgan fingerprints with radius of 3 but without features. This gave an R coefficient of 0.98, indicating almost perfect identity between the two model implementations.

BENEFITS FOR USERS: The result of a public-private collaboration helps to develop a consensus model for predicting blood stage malaria inhibition

Hi @Pmaidoo I think all information is there but it is a bit too long, and people wont read through it all. Can you make it shorter and perhaps with a few more sections with only one or two lines under it?

Pmaidoo commented 2 years ago

@GemmaTuron i have made it visible now. kindly check them and let me know https://docs.google.com/document/d/1MZHRs7lL-iPGB1DZM-oKjMJInPSkkgnQRehyMAK0VD8/edit?usp=sharing

https://docs.google.com/document/d/1h4cf8-cbsbTd1oojPc4UGsiko0IJpVZwHVRleleay4A/edit?usp=sharing

Pmaidoo commented 2 years ago

hi @GemmaTuron here is my technical card on one model from the hub after making all the correction you made mention of. its shorter now.

Malaria inhibitor prediction TECHNICAL CARD USE : The malaria inhibitor prediction (MAIP) platform is the result of a public-private collaboration whose aim is to develop a consensus model for predicting blood stage malaria inhibition. Five Pharma and not-for-profit partners trained a model on their private datasets. The resulting models were combined by EMBL-EBI and made available through this public prediction platform.

ALGORITHM: t-distributed Stochastic Neighbor Embedding (t-SNE) is an algorithm performing a nonlinear dimensionality reduction and designed for data visualization. The resulting sparse matrix corresponding to the chemical features present in the validation set compounds was used as input for scikit-learn’s implementation of the t-SNE algorithm using a perplexity value of 500.

DATA: Eleven datasets from five different partners were used in this study to train models. The Evotec, Johns Hopkins, MRCT, MMV - St. Jude, AZ, GSK, and St. Jude Vendor Library datasets were essentially used. The Medicines for Malaria Venture partner provided three additional datasets to be used for training models and the Novartis dataset was used as well

OUTCOMES: Our first goal was to assess the ability of our new software methods and code to reproduce the previous study. We determined that the major difference would be due to implementations of the descriptor calculations as the distributions of calculated physico-chemical properties are reasonably well (but not perfectly) correlated . A pairwise comparison using the Pearson correlation coefficient (R) for the two sets of scores gave a value of 0.88, indicating a good but not perfect correlation.

BENEFITS FOR USERS: The result of a public-private collaboration helps to develop a consensus model for predicting blood stage malaria inhibition

Pmaidoo commented 2 years ago

hi @GemmaTuron here is my final technical card on one model from the hub after making the corrections you made mention of , its shorter and concise now. Awaiting your critiques if any.

Malaria inhibitor prediction TECHNICAL CARD USE : The malaria inhibitor prediction (MAIP) platform is the result of a public-private collaboration whose aim is to develop a consensus model for predicting blood stage malaria inhibition.

ALGORITHM: t-distributed Stochastic Neighbor Embedding (t-SNE) is an algorithm performing a nonlinear dimensionality reduction and designed for data visualization. The resulting sparse matrix corresponding to the chemical features present in the validation set compounds was used as input for scikit-learn’s implementation of the t-SNE algorithm using a perplexity value of 500.

DATA: The Evotec, Johns Hopkins, MRCT, MMV - St. Jude, AZ, GSK, and St. Jude Vendor Library datasets were essentially used. The Medicines for Malaria Venture partner provided three additional datasets to be used for training models and the Novartis dataset was used as well

OUTCOMES: Our first goal was to assess the ability of our new software methods and code to reproduce the previous study . A pairwise comparison using the Pearson correlation coefficient (R) for the two sets of scores gave a value of 0.88, indicating a good but not perfect correlation.

BENEFITS FOR USERS: The result of a public-private collaboration helps to develop a consensus model for predicting blood stage malaria inhibition

Pmaidoo commented 2 years ago

@GemmaTuron

Template for Twitter.

MAIN POST Hey there! We have a new model in the hub! we are sharing with you all you need to know since we are in this together! 1/3

THREAD -Name of model/ Date of incorporation into the hub and brief description of how it works. (2/3) -How the model works and more info on how to go about it( 3/3).

Pmaidoo commented 2 years ago

@GemmaTuron here is my template short Newsletter

Yaay! Excited to have you here We’ve got some interesting news for you. We have introduced (number of models) models into the Ersilia model hub. They are__ (Give the names of the models). For previous months we raised a fund of about (amount of funds) and we presently have (number) users. Currently have about__ (number ) of volunteers working with us . Because of this there is a significant increase in production daily. For more information on the models, check out the documentation on our main website (link to the website). Thank you! Have a great day ahead.

Pmaidoo commented 2 years ago

@GemmaTuron please find attached the link to my google doc with the ersilia image showing its vission and mission

https://docs.google.com/document/d/1kgbWjhx0WFef2Lh2lJXV16IRd1SQcn64n3fkM1obglM/edit?usp=sharing

Pmaidoo commented 2 years ago

@GemmaTuron About the community task above, i dont seem to understand the question well . which exact project does it speak of ? "Look up two other projects and comment on their issues with feedback on one of their tasks"

Pmaidoo commented 2 years ago

please find attached the link to my blog post on own topic related to Ersilia (AI/ML for biomedical research) https://docs.google.com/document/d/1h4cf8-cbsbTd1oojPc4UGsiko0IJpVZwHVRleleay4A/edit?usp=sharing

tracynuwagaba commented 2 years ago

Hi @Pmaidoo, I have noticed a slight spelling mistake in the google doc showing ersilia's vision and mission. You wrote healthcae in the vision instead of healthcare.

Pmaidoo commented 2 years ago

@tracycod3r Thank you for the correction i will check it

Kcfreshly commented 2 years ago

Well done @Pmaidoo

GemmaTuron commented 2 years ago

Hi @Pmaidoo

Thanks for rewriting the blogpost. You have done a lot of work in the project so I think now you should focus on preparing your final application in the outreachy website, many thanks!

srcmilena commented 2 years ago

Hello, @Pmaidoo!

I read your first task and I also like how the artificial intelligence can solve so many issues and I loved to know how Ersilia apply this in their concept. Your document file name as "All about Ersilia" is also very helpful. Congrats for that!

Good work.

Pmaidoo commented 2 years ago

@srcmilena thank you very much . alright i will write you on the slack if possible

srcmilena commented 2 years ago

@Pmaidoo of course! you are welcome. i'll be waiting you on slack! see ya ☺️

DokuaAsiedu commented 2 years ago

@Pmaidoo loved reading your blog post. It was very concise, straight to the point and conveyed the main points of the article. The addition of a table condenses the information and gives you everything you need to read. Well done!

Pmaidoo commented 2 years ago

@GemmaTuron i had wanted to create a readme file and test a model after installation but i was unable to even after lots of help from my colleagues. some steps and guard lines are not so detailed for a beginner and i would like to suggest that care is taking to help curb that and also make it easier for any beginner in order to save time