ersilia-os / eos46ev

GNU General Public License v3.0
0 stars 1 forks source link

BioModel Discussion Issue for eos46ev #9

Open Zainab-ik opened 8 months ago

Zainab-ik commented 8 months ago

Summary !!!

  1. For this model, 4 ML algorithm was used in building the models. I added all to the metadata considering that the final model (deployed to the web server) is a combination of all.
  2. Although, stated that XGBoost is the best, the final model is a fusion of 4 algorithm; Random forest, Deep Neural Network, Support Vector Machine, XGBoost.
  3. Looking at the repository, I realised Ersilia only implemented XGBoost model, does that nullify the rest of the algorithm as an unimportant metadata?
  4. Similarity search is a feature of the deployed model to the web server, should that be added to the metadata as output or no, since it's not implemented in Ersilia?
  5. I realised there's no GitHub repository for the original source code except the deployed web server.
  6. For this model, I have 3 output from the curation.
    • Similarity search; an extra function of the web server/model
    • MTB Inhibitor Prediction; this is a more specific output of the model
    • MTB non-inhibitors prediction; this is also a more specific output of the model

The model is described to do just MTB inhibitor prediction. However, while running the web server, it gives a result of 1 and 0 during prediction which I presume the 0 to be a non-inhibitor and the 1 as an inhibitor.

Screenshot 2024-04-02 at 10 39 08

Screenshot 2024-04-02 at 10 39 50

I included both as metadata, annotated with the antituberculosis ontology

roles Names
Curator @Zainab-ik
Code Contributor @Amna-28
Reviewer @GemmaTuron
GemmaTuron commented 8 months ago

Hi @Zainab-ik !

For this model, 4 ML algorithm was used in building the models. I added all to the metadata considering that the final model (deployed to the web server) is a combination of all. Although, stated that XGBoost is the best, the final model is a fusion of 4 algorithm; Random forest, Deep Neural Network, Support Vector Machine, XGBoost. Looking at the repository, I realised Ersilia only implemented XGBoost model, does that nullify the rest of the algorithm as an unimportant metadata? If I recall correctly, the model pretrained parameters were downloaded from this website and they correspond to the stacked model of the four algorithms (this file here). Where did you see we only implemented the XGBoost model?

Similarity search is a feature of the deployed model to the web server, should that be added to the metadata as output or no, since it's not implemented in Ersilia? I think we should not add it as metadata, but let's ask the biomodels team what do we do when the original model features more capabilities than the part that is incorporated in Ersilia - can you do it?

I realised there's no GitHub repository for the original source code except the deployed web server.Let's use the web server as original source code then

For this model, I have 3 output from the curation. Similarity search; an extra function of the web server/model, MTB Inhibitor Prediction; this is a more specific output of the model, MTB non-inhibitors prediction; this is also a more specific output of the model I'd ask about the similarity search to BioModels team

Zainab-ik commented 8 months ago

Hi @Zainab-ik !

For this model, 4 ML algorithm was used in building the models. I added all to the metadata considering that the final model (deployed to the web server) is a combination of all. Although, stated that XGBoost is the best, the final model is a fusion of 4 algorithm; Random forest, Deep Neural Network, Support Vector Machine, XGBoost. Looking at the repository, I realised Ersilia only implemented XGBoost model, does that nullify the rest of the algorithm as an unimportant metadata? If I recall correctly, the model pretrained parameters were downloaded from this website and they correspond to the stacked model of the four algorithms (this file here). Where did you see we only implemented the XGBoost model?

My mistake. I went through the code in the main.py file and could only point out XGBoost. Although I added all the algorithm as metadata.

Similarity search is a feature of the deployed model to the web server, should that be added to the metadata as output or no, since it's not implemented in Ersilia? I think we should not add it as metadata, but let's ask the biomodels team what do we do when the original model features more capabilities than the part that is incorporated in Ersilia - can you do it?

Sheriff mentioned to include only the feature implemented in Ersilia. This has been corrected.

I realised there's no GitHub repository for the original source code except the deployed web server.Let's use the web server as original source code then

I added as model description. Thanks for clarifying.

For this model, I have 3 output from the curation. Similarity search; an extra function of the web server/model, MTB Inhibitor Prediction; this is a more specific output of the model, MTB non-inhibitors prediction; this is also a more specific output of the model I'd ask about the similarity search to BioModels team

I needed clarification around the MTB non-inhibitor since it wasn't specifically stated in the paper. Do you need more clarification around my question?