Open miquelduranfrigola opened 4 months ago
Antimicrobial and COVID models uploaded to BioModels
Hey @Zainab-ik
Before starting with new models, can you have a look at the existing ones and make sure they all comply with the latest decisions we have made? Note down here any changes that had to be made in the annotations.
thanks!
Hey @Zainab-ik
Before starting with new models, can you have a look at the existing ones and make sure they all comply with the latest decisions we have made? Note down here any changes that had to be made in the annotations.
thanks!
Yes, working on that.
Previous Model review Summary - Removed general metadata, and confirmed experimental validation
Regarding the first 2 models; eos7kpb, eos80ch
eos7kpb ; Physicochemical Assays Clearance Solubility assay cytotoxicity Aqueous solubility permeability assay Microsomal metabolic stability These metadata aren't integral to the Zairachem model, I want to run by you first.
eos80ch ; Removed the following metadata; compound screening, phenotype, molecular representation, molecular representation, parasites, phenotype.
Hi @Zainab-ik
Good on the corrections, as we discussed let's leave all the biological endpoints on eos7kpb
Update: eos4zfy ready for review.
BioModels Upload;
To-do's
Automating Metadata Annotation using Zooma This process involves mapping the right ontology to the metadata automatically to speed up annotation process For this process, I'd be starting with these two models
Steps;
Comments/Observation
Models uploaded to BioModels
New model Annotation - In Progress
eos2lqb - issue eos6oli - issue eos7d58 - issue eos8lok - issue
Note: I've been working with a lot of regression model recently which is quite exciting. One of the evaluating metrics is root-mean-square error (RMSE), which I believe is also known as RMSD while reading. On OLS, RMSE doesn't exists but RMSD does, and i've been using that in my annotation.
Hi @Zainab-ik !
I'm having a look at the models you are annotating, let me know when the excel files are ready - RMSE and RMSD are the same ;)
Hi @Zainab-ik !
I'm having a look at the models you are annotating, let me know when the excel files are ready - RMSE and RMSD are the same ;)
Alright, Thanks @GemmaTuron
New model Annotation - In Progress
eos2lqb - issue eos6oli - issue eos7d58 - issue eos8lok - issue
Note: I've been working with a lot of regression model recently which is quite exciting. One of the evaluating metrics is root-mean-square error (RMSE), which I believe is also known as RMSD while reading. On OLS, RMSE doesn't exists but RMSD does, and i've been using that in my annotation.
Hi @GemmaTuron All models ready for review except eos7d58. It has a broad output and I'd like to comfirm if all the output are incorporated into the Ersilia version.
Grover Models
General comments about the Grover model
eos7w6n - This is the base model (GROVER) that was fine-tuned for task-specific dataset.
Grover Models - Annotation in Progress (Metadata extraction and curation done)
eos7w6n - This is the base model (GROVER) that was fine-tuned for task-specific dataset.
Grover Models - Annotation in Progress (Metadata extraction and curation done)
All models ready for review.
Hi @Zainab-ik
Those look good, just a comment on QSAR - I would not annotate the general model as a QSAR. Grover is applied to different datasets as a molecular representation for QSAR or QSPR (structure-activity and structure-property)
Let's pause model annotation here for the weel and focus on the documentation of the process - which will also be needed for the Hackathon: Let's use this document to create the information and then we will move it to Gitbook.
Tasks:
Update - All Grover models incorporated into BioModels.
Non-grover models uploaded
this document
Currently working on Documentation.
Good job @Zainab-ik
Let me know if you need help/review in the documentation process
Let's pause model annotation here for the weel and focus on the documentation of the process - which will also be needed for the Hackathon: Let's use this document to create the information and then we will move it to Gitbook.
Tasks:
- [x] Create Documentation
- [x] Incorporate it into GitBook
- [x] Prepare the intro for the Hackathon
- [ ] convert one model to ONNX to try out
All task done except ONNX conversion.
Current Tasks:
For the Hackathon, There are 5 open Models for Annotation.
Model 1 - eos1n4b - issue
Model 2 - eos92sw - issue
Model 3 - eos2ta5 - issue This is quite clear. Just to clarify, negative predictive value (NPV), and positive predictive value (PPV) are True positive, and False positive, right?
Hi @Zainab-ik Good job in those with the Hackathon team. See below my comments:
Model 1 - eos1n4b - https://github.com/ersilia-os/eos1n4b/issues/8 The drug target "Histone deacetylase 3" is related to different diseases such as cancer, and diabetes. Those aren't related metadata - indeed, you are right This model was built using 5 algorithm and 3 descriptors; Algorithm - k-Nearest Neighbour (KNN), Support Vector Machine (SVM), Random forest (RF), eXtreme Gradient Boosting (XGBoost), Deep Neural Network (DNN). Descriptors - Mordred descriptors, MACCS key, Morgan fingerprint. The best performing model is the XGBoost with the Morgan fingerprint. (that's the deployed model to the GUI application) For our annotation, we'd only be including the best performing model and its feature. - yes, that is correct, good We have an ROC enrichment as an evaluation metrics between the validation and training dataset. I'm not sure it fits into the metadata. What do you think? We can add ROC Curve as evaluation metric and that's it? Model 2 - eos92sw - https://github.com/ersilia-os/eos92sw/issues/12
Can I comfirm if this is a Neural network model? there are mentions of nodes, and layers, and the type of algorithm. As stated in the publication: In this study, we utilize a DBN so that is the type of network they chose
It's difficult to identify the exact training dataset. It's a combination of data from several database. Should we list them all, or how do we consolidate that. Yes, list the databases in Table 1
All these algorithm were mentioned; Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Extremely Randomized Trees or Extra Trees (ET), algorithm. However, there was more emphasis on ET algorithm. Wasn't it a DBN? Both classification and regression task evaluation was done. In the Ersilia Repository, there was only a mention of regression as task, should we stick to that for the evaluation metric? Yes
Model 3 - eos2ta5 - https://github.com/ersilia-os/eos2ta5/issues/6 This is quite clear. Just to clarify, negative predictive value (NPV), and positive predictive value (PPV) are True positive, and False positive, right? proportion of values that are True Negative and True Positive respectively if I am not wrong
Summary
We have partnered with BioModels at EMBL-EBI (Hinxton) to explore potential ways to incorporate Ersilia's models into well-established BioModels resource.
Of note, BioModels model annotation is based on ontologies as reported in the Ontology Lookup Service. We expect to reach similar standards thanks to the current project.
Scope
Initiative π
Objective(s)
The objectives of the project are the following:
Team
@Zainab-ik is currently doing an internship at EBI-EMBL in the BioModels team.
Importantly, @Zainab-ik will meet with @miquelduranfrigola twice a week to report progress and decide next steps. Previous to the meeting, @Zainab-ik will update the corresponding model issues and, after the meeting, actionables will be reflected in the issues.
Timeline
The project timeline is still up for discussion. This are some tentative milestones:
Documentation
A backlog of models can be found in the Ersilia BioModels Spreadsheet. This spreadsheet should act as a centralized resource to keep track of progress.
The shared folder in Google Drive can be accessed here.