draeger-lab / ModelPolisher

ModelPolisher accesses the BiGG Models knowledgebase to annotate SBML models.
MIT License
23 stars 7 forks source link

Spurious annotations #89

Open mephenor opened 4 years ago

mephenor commented 4 years ago

During a quick look through the models I found that H2O is annotated as hydroxide additionally, among other things. The question here is whether this is correct or this should be fixed. Problem can be reproduced by e.g. polishing iCHOv1.json from https://github.com/SBRG/bigg_models_data/tree/master/models.

We need to check whether similar things happen for other species/reactions/etc. and queries need to be adapted to be more restrictive or rewritten in a different. However this might be quite time consuming, as annotations would need to be checked manually for plausibility.

mephenor commented 4 years ago

Upon a bit of further investigation annotations for some species reference the same entity, but in different organisms and, as the code to retrieve a BiGGId from annotations currently cannot retrieve the correct compartment, also across different compartments in some cases.

Two things can be done here:

glucksfall commented 3 years ago

Hi @mephenor,

I think I'm late for the party.

I have found the same as you doing a small task to have the same IDs for different models. In the case of H2O and OH-, both have the same annotation (Also ammonia and ammonium). The problem is deeper when we consider that some annotations refer specifically to water or OH- (e.g. KEGG C00001 vs C01328) or unspecifically refer to both (e.g. XLYOFNOQVPJJNP-UHFFFAOYSA-M is the inchikey for both).

Additionally, some annotation refers erroneously to water (e.g. MNXM2 = OH-) or simply wrong, such as META:OXONIUM (OH3+).

OK... If you would like, we could collaborate to take a deeper look at the issue. Moreover, I would like to add that some models at BIGG have metabolites with the same ID, same name, same molecular formula, but different charges.

Best regards, Rodrigo

mephenor commented 3 years ago

Hi @glucksfall and sorry for the very late response, I started a new job, did not get the notification and haven't had that much time to look into this issue, so the whole Polisher is currently a bit stuck in limbo with this being the current major issue to block a new release.

I have not found a solution yet, however, regarding your observation:

Moreover, I would like to add that some models at BIGG have metabolites with the same ID, same name, same molecular formula, but different charges.

After having another look at the database, BIGG only seems to store charge information in the model_compartmentalized_component table, which then references the component table, where the bigg_id and name are stored. So the bigg_id actually does not discriminate between different charge states and the obvious solution would be to add a filter on the annotations obtained. However, this would require to resolve those and reliably retrieve their charge information.

Schmoho commented 2 years ago

see here for a list of all the annotations that are added to a minimal water species