Closed miquelduranfrigola closed 2 weeks ago
Hello @Zainab-ik, as discussed, let's start by conceiving an issue template to prompt discussion about each model individually.
I suggest that we start by doing this in the current antimalarial model, then we can replicate the template to other models as we see fit. In my opinion, the template should not be too complex.
@Zainab-ik here are some questions in preparation with our meeting with Sheriff. Feel free to add more:
I'd be working on the issue template. Note: The Ersilia BioModel spreadsheet seems to be empty.
@Zainab-ik here are some questions in preparation with our meeting with Sheriff. Feel free to add more:
- What is the minimum and maximum number of qualifiers in a model? How many are recommended?
- Is there a convention for naming models? Is it the year & title of publication?
- Is there a structure or guidelines for model descriptions?
- Many papers have extra analysis not directly related to the model. For example, dimensionality reduction with UMAP, or clustering. Do we need to include these in the metadata?
- Do you have any experience with the chemical information ontology?
I'd be working on the issue template. Note: The Ersilia BioModel spreadsheet seems to be empty.
Yes it is empty for now. Please add the two models that we are currently working on and then we will add more.
Update After meeting with Sheriff;
@miquelduranfrigola Am I missing anything?
Thanks @Zainab-ik - this is very useful. I don't think anything is missing. Perhaps just mention that BAO is also an important ontology to consider.
Update!!!
Regarding Citation. Sheriff mentioned there's an option to indicate Modeller while uploading the annotation files. The modeller incorporates the model into the Ersilia Model Hub. He mentioned he'd have a discussion with @GemmaTuron regarding this.
Mode Annotation I've completed the first 2 annotation and I've made comparison with the initial annotation. I think ours is more detailed. I included more model properties, and used ontologies closer to the Chemistry term. However, there's a couple of things to be done before finalizing. Some ontologies aren't registered with the resolver which i'm making requests for at the moment. They'd be updated after it's published in the resolver registry. We are making use of the resolver for safe referencing and to standardize the URL. Although, not yet finalized, I've added the 2 models; eos80ch and eos7kp for review.
GitHub Issue Template
While discussing with @miquelduranfrigola, He suggested I create an issue template, open it for each models i'm annotating, link them to this main issue to keep track of the work, and finally close them after the model is uploaded to the BioModel repository.
Using the Ersilia issue template as sample, I came up with a draft and I'd like a review before incorporating into each model repository. BioModel Incorporation Issue
I'd like to ask about the issue usage considering we'd have to open in each model repository and not the general repository?
Hi @Zainab-ik
After our meeting today, please:
From my side, I'll prioritize some further models for annotation. And we have decided that, once we have completed the annotation of at least 10 models, we will start thinking about:
Hi @Zainab-ik
After our meeting today, please:
- go ahead an open the issues in the two models we are working on following your proposed template. We will try it out and once we are happy with it, we will upload it to all repos as a template
- Add the publications of the models in the folder
- Finish the model annotations for both and add any questions / comments you might have on the issues, so we can initiate a discussion
From my side, I'll prioritize some further models for annotation. And we have decided that, once we have completed the annotation of at least 10 models, we will start thinking about:
- validation of the models
- automatically storing biomodel annotations in Ersilia
Following the meeting.
I'd work on completing the annotation, I've sorted the compact identifiers with the EBI team. I'd also try uploading one model to the BioModels with Sheriff to give a sample of what the issue template information would look like.
Hi @Zainab-ik
Thanks! This is looking good, as I stated in the model issues I suggest we have two issues, one for discussion and one we will only open once we know which data from BioModels we want to store in Ersilia as well. If you agree, then let's go ahead and use the open issues to create those "discussion" issues around models eos80ch and eos7kbp so we can fully annotate these two and then proceed onto the next ones. I'd say the second issue, to collect data from BioModels for storing in Ersilia, can be built once we have at least 10 models annotated and know better the kind of information we want to collect
Thanks @Zainab-ik ! I have a few suggestions on the discussion template, let me know your thoughts
Hi @GemmaTuron
I've worked around the suggestions. Completed the annotation for the 2 models, updated the link, and added metadata information for eos7kbp. I'm clear on the eos80ch model, and it's been uploaded. I'd share when it's available to the public, that'd be by tomorrow.
Do I go ahead and start working on the priority models in the sheet?
Also, there's an option of opening an account on BioModels to review submissions. BioModels facilitates some ways to offer collaboration or review or access of models.
I think 1 applies to us. I could share my submission for review. Either @GemmaTuron or @miquelduranfrigola or both can have an account, what do you think?
@GemmaTuron feel free to take the lead here π Thanks @Zainab-ik for a very clear update.
Hi @Zainab-ik !
Thanks, good start! Feedback from today's meeting:
If you are done with all the tasks before our next meeting, I suggest you have a look at the model incorporation that is still midway, but this is less prioritary
Feedback from BioModels (Sheriff) !!
I've incorporated all feedbacks into the two models. I believe both models are fully annotated.
The following are/would be standard metadata in all models;
Update!!!
DOME annotation completed and both models are up on BioModels. eos7kpb - https://www.ebi.ac.uk/biomodels/MODEL2403270001 eos80ch - https://www.ebi.ac.uk/biomodels/MODEL2403270002
This has been linked in the respective repository.
eos46ev !!!
A more detailed comments/question is in the issue here The curation/annotation completed and can be accessed here
eos4e40 !!!
I realized the use of term active, inactive, hit, non-hit, when describing data binarization is dependent on a paper. How do we pick a standard then? They are all mapped with ontology terms except non-hit
The curation/annotation can be accessed here
eos5xng !!!
The curation/annotation completed and linked here
An open-ended Question
"How much of the model properties i.e. core model properties (e.g., packages, libraries, open source software) should be curated and annotated?" Examples below;
Hi @Zainab-ik,
Good job, thanks for the updates, please find below some comments:
Hi @Zainab-ik,
Good job, thanks for the updates, please find below some comments:
Thank you @GemmaTuron
- I do not understand this sentence: For Proprietary data, URL should be added if it's available. If not, it should be included in the metadata for transparency. For eos7kbp, I added it and annotated it with a suitable ontology since there's no URL available. As it is proprietary data, it will never have an available URL as the data is not shared. What do you mean you have added it?
For this, I added H3D Priopetary term as a metadata and just annotated with a suitable ontology and the ontology link. I didn'r necessarily mean I added the priopetary data link. Sheriff mentioned the term should be added for transparency.
- Regarding the updated models, please do not update them on BioModels until I have revised them and given the final OK. Remember to use this excel to track progress, if the model is still "To review" means it has not yet been approved - this way we can be sure all the information in biomodels is 100% correct
Noted @GemmaTuron, That was uploaded as a sample to have an insight into how the overview would look and if there's any comment or any changes the Ersilia team would like. I'd appreciate a feedback on that. The upload can always be updated.
- Some of the links in the BioModels website seem broken, could you check that?
I'd inform the BioModels team. Could you please specify which so I can exactly mention.
- Le'ts consolidate the tags for all models. Can you share with me what is the list of available tags?
These are the lists of tags available. A new one can be proposed if that'd be more suitable for Ersilia models.
- Are Active / Inactive properties or Outputs?
They are properties. More like data properties very relevant to the model.
eos5xng !!!
- I opened an issue here, and added a comment below;
- ESKAPE pathogen inhibition is the experimental validation of the AI model, if i'm right? If yes, then those pathogens do not classify as a taxonomy in the metadata.
- For the model training and prediction, both classification and regression tasks were performed. Ersilia model only performed classification and that should be the only one included in the metadata, right?
- Both RMSE and MAE scores are evaluation metrics for regression tasks, if 2 is yes, then both methods would apply.
The curation/annotation is in progress...
This can be attended to.
Update !!!
Next Point of Action - Annotate NCATS models.
NCATS Metabolism Models !!!
Models Specifics BioModel Title Annotation File eos3ev6 CYP3A4 Gonzalez2021 - QSAR Prediction Model for CYP3A4 Inhibitor and Substrate here eos7nno CYP2D6 Gonzalez2021 - QSAR Prediction Model for CYP2D6 Inhibitor and Substrate here eos5jz9 CYP2C9 Gonzalez2021 - QSAR Prediction Model for CYP2C9 Inhibitor and Substrate here eos44zp CYP450 Gonzalez2021 - QSAR Prediction Model for CYP450 enzyme Inhibitor and Substrate here
Comments
All these metabolism models all come from a single publication, with CYP450 being a more generic model while the others are specific type. The comments/questions and metadata would be the same and applicable to all the models, with exception to the individual assay data, and Ersilia repository URL. Would it be necessary to open individual issues? (I opened for eos36ev here already)
Two models were built; a DNN based model and a Stratified Bagging Random Classifier. DNN is best while the SB is next, however, the SB was chosen as default due to accessibility. Does DNN classify as a metadata?
Suggestions
Hi @Zainab-ik
Thanks for the update. A few pointers:
eos3804 !!!
Metadata curation and annotation can be accessed here
Permeability Models
eos9tyg and eos81ew; PAMPA 7.4 & PAMPA 5 !!!
Here are a few comments (from PAMPA 5 publication) ;
Oral permeability as a complex process that's dependent on membrane permeability, does that classify Oral permeability as a model property metadata?
The publication made mention of 2 models; a classifier and a neural network. While looking through the repo, I only see a Neural network .py file, are the models merged (is there a sort of model fusion like stacking), or only the Neural network model is implemeted?
There are 2 PAMPA models ( eos9tyg - PAMPA 7 & eos81ew - PAMPA 5). Both repository have same publication which describes only PAMPA 5. However, the publication for PAMPA 7.4.pdf attached is different even though I believe the model implementation and incorporation follows the same standard.
PAMPA 7 metadata is few compared to PAMPA 5 using their respective publication.
The PubChem bioassay dataset indicated in the NCATS website is also dfferent
In curating and annotating PAMPA 7.4 which is eos9tyg, which publication suits best; the publication in the repository which describes PAMPA 5 or the original publication?
There's a PAMPA-BBB model in NCATS, is that also incorporated in Ersilia? ---I checked, can't find.
From Original PAMPA publication;
I noticed some errors in the eos81ew repository while looking through the model checkpoints and frameworks.
In eos81ew repo, there's a readme error in this folder - a mention of kinetic aqeous solubility which belongs to eos74bo
In the framework folder for eos81ew, there's also a readme description about eos74b0 and a github link about eos8ykt which doesn't seem to exist in Ersilia Model Hub.
NCATS Metabolism Models Uploaded on BioModels.
@Zainab-ik
Please follow the guidelines we drafted. When you start working on a new model, you should:
Please move the above comments to where they belong, and I will answer there, thanks!
When you do so, please clarify what do you refer to with this: The PubChem bioassay dataset indicated in the NCATS website is also dfferent. Different from what, and for which model?
To which model are your referring here? The publication made mention of 2 models; a classifier and a neural network. the fact that the model is a neural network does not prevent it from being a classifier at the same time
@Zainab-ik
Please follow the guidelines we drafted. When you start working on a new model, you should:
- Mark it as ongoing on the shared Excel
- Open an issue on the specific model repository
- Create a file for the annotation in the shared folder
- Add the publication in the drive
All done
Please move the above comments to where they belong, and I will answer there, thanks!
When you do so, please clarify what do you refer to with this: The PubChem bioassay dataset indicated in the NCATS website is also dfferent. Different from what, and for which model?
While going through the NCATS website, the bioassay dataset for both PAMPA are different PAMPA 5.0 - eos81ew - https://pubchem.ncbi.nlm.nih.gov/bioassay/1645871 PAMPA 7.4 - eos9tyg - https://pubchem.ncbi.nlm.nih.gov/bioassay/1508612
To which model are your referring here? The publication made mention of 2 models; a classifier and a neural network. the fact that the model is a neural network does not prevent it from being a classifier at the same time.
Thanks for clarifying this.
Update !!! I opened a couple of PRs
Other NCATS models uploaded to BioModels
Antimicrobial models annotation
Questions
Hi @Zainab-ik !
Good job thanks for keeping it up! I have answered your questions in the respective models and below the general ones:
I created a new tag in BioModels called Ersilia and that'd be attached to all models. - Fantastic! Questions
Can all the drug discovery models be referred to as a QSAR model? Mmm at the moment, most of the models we have are QSAR yes, but that might not be true in the future. @miquelduranfrigola what do you say here?
If an animal model is used to perform experimental validation of the model, should that be added as a biological properties of the mode i.e.,taxonomy I don't think so, this is related to the validation but not how the dataset for the model was built.
The publication here is the same as eose40 but this is SARS-COV2 Inhibition. the paper is discussing antibiotics but SARS-COV2 should be antiviral. Can you clarify please. - The antiviral model does not have a publication per se, but they developed it in parallel with the antibiotic predictor, using the ChemProp. Since the antibiotic prediction paper is the one which describes the original ChemProp development, is the most appropriate citation
Hi @Zainab-ik !
Good job thanks for keeping it up! I have answered your questions in the respective models and below the general ones:
Thank you @GemmaTuron
- I created a new tag in BioModels called Ersilia and that'd be attached to all models. - Fantastic! Questions
- Can all the drug discovery models be referred to as a QSAR model? Mmm at the moment, most of the models we have are QSAR yes, but that might not be true in the future. @miquelduranfrigola what do you say here?
That's great. That'd mean a QSAR metadata should be constant one, right. Just a thought;can a generative model classify as QSAR too?
- If an animal model is used to perform experimental validation of the model, should that be added as a biological properties of the model i.e.,taxonomy I don't think so, this is related to the validation but not how the dataset for the model was built.
Okay, that's clarified. What if an experimental method (in-vivo precisely) is used to generate the dataset then, should experimental method and the in-vivo model be added as a metadata then?
- The publication here is the same as eose40 but this is SARS-COV2 Inhibition. the paper is discussing antibiotics but SARS-COV2 should be antiviral. Can you clarify please. - The antiviral model does not have a publication per se, but they developed it in parallel with the antibiotic predictor, using the ChemProp. Since the antibiotic prediction paper is the one which describes the original ChemProp development, is the most appropriate citation
The metadata would be the same except for the organism and output and adding an antiviral metadata to it.
SARS-COV2 model annotation
Regarding eos9f6t - The publication here is the same as eose40 but this is SARS-COV2 Inhibition. the paper is discussing antibiotics but SARS-COV2 should be antiviral. Can you clarify please.
@GemmaTuron All models ready for review.
Hi @GemmaTuron
A few clarifications from the meeting;
Experimental method emerges from both data generation and model validation. How to represent in the annotation and curation should be
in-vivo model - data source
in-vitro model - data source
in-vivo model - model validation
Does this best describe the experimentation part of the model?Organism without taxonomy; properties, right?
Model validation data source aren't essential part of the model and shouldn't be a metadata.
All models are QSAR at this moment and should be a constant metadata.
Removal of not-so important metadata e.g., hits
Evaluation metrics not used shouldn't be added
Hackathon schedule.
Hi @Zainab-ik ! I have reviewed the models, please amend them and then upload to BioModels. A few general comments from our meeting:
After redoing the current models to review, let's get back to the old ones before we move onto the new ones. Feel free to reopen the issues and note the changes that should be made
A clarification regarding the in-vivo and in-vitro, if it's used for data generation, it's not to be added, right @GemmaTuron
A clarification regarding the in-vivo and in-vitro, if it's used for data generation, it's not to be added, right @GemmaTuron
exactly, all data has been eventually generated experimentally, so it is not that relevant to collect this information
General fields that do not add information;
Hi @Zainab-ik
I agree with most of them but MACCS keys are a different type of descriptor. IF the model is using RDKIT descriptors we should annotate that, if it is using MACCS we should annotate it and maybe we should think if we want to annotate all the different descriptors used
That's right. The only challenge is MACCS and RDKIT are the only descriptors present in OLS that can be annotated.
Summary
We have partnered with BioModels at EMBL-EBI (Hinxton) to explore potential ways to incorporate Ersilia's models into well-established BioModels resource.
Of note, BioModels model annotation is based on ontologies as reported in the Ontology Lookup Service. We expect to reach similar standards thanks to the current project.
Scope
Initiative π
Objective(s)
The objectives of the project are the following:
Team
@Zainab-ik is currently doing an internship at EBI-EMBL in the BioModels team.
Importantly, @Zainab-ik will meet with @miquelduranfrigola twice a week to report progress and decide next steps. Previous to the meeting, @Zainab-ik will update the corresponding model issues and, after the meeting, actionables will be reflected in the issues.
Timeline
The project timeline is still up for discussion. This are some tentative milestones:
Documentation
A backlog of models can be found in the Ersilia BioModels Spreadsheet. This spreadsheet should act as a centralized resource to keep track of progress.
The shared folder in Google Drive can be accessed here.