Porthmeus / MeMoMe

Metabolic Model Merging - a semiautomated way to merge genome scale metabolic models
Apache License 2.0
0 stars 0 forks source link

Handling different types of annotation #43

Open Porthmeus opened 1 year ago

Porthmeus commented 1 year ago

We decided today, that we need to split the annotation process for the metabolites into bulk queries and individual queries. That means, if we would like to retrieve information from a database for a list of metabolites, we would call that a bulk query. If the request we are posing is specific to that metabolite, we will call it an individual query.

What does that mean for the coding and the structure of the process:

  1. Individual queries should be implemented as a function per database within the MeMoMetabolite class
  2. Individual queries, will then be called as necessary by a general annotate function within the MeMoMetabolite class
  3. Bulk queries should be implemented as a function per database within the MeMoModel class
  4. Bulk queries will then be called as necessary by a general annotate function within the MeMoModel class
  5. The MeMoModel.annotate() will eventually call the MeMoMetabolite.annotate() function for all those metabolites that are not sufficiently annotated yet

Regarding these considerations - there is another distinction for the bulk queries. Some databases (for instance BiGG) will not allow bulk queries to be submitted via their API, thus we decide to download these databases once during the installation process and do the queries locally. This scheme should be kept for all databases which require more ~100 queries per annotation process.

This whole issue relates to #12 #21 #28 and #29

Unaimend commented 6 months ago

Bulk annotation is mostly done, it might make sense to see how much stuff is left not annotated and if making request to other dbs would significantly improve results ... this could be an improvement.

Otherwise the MeMoMetabolite.annotate function is currently not need @Porthmeus Your thoughts?