draeger-lab / ModelPolisher

ModelPolisher accesses the BiGG Models knowledgebase to annotate SBML models.
MIT License
23 stars 7 forks source link

Map all annotations to BiGG #20

Open draeger opened 7 years ago

draeger commented 7 years ago

Suggested enhancement by @tpfau: Look over all present annotations and map every annotation that can be mapped to BiGG. For instance, if there is a KEGG compound annotation that compound will be assigned its corresponding BiGG id along with all other annotations available in BiGG. Especially since that annotation data is already present in the BiGG Models Database, this would make ModelPolisher much more useful.

As long as ModelPolisher only relies on BiGG ids as an input this will always require manual matching of the original id used to BiGG ids or assume that the model originally used BiGG ids. It would be much better to make it database dependent.

mephenor commented 4 years ago

While this has been implemented during GSoC19, proper testing of the feature has not taken place yet. As discussed some models containing annotations from BioModels could be used for initial manual testing and converted into test cases later on, after validating that 1) additional annotations are obtained and 2) those annotations are in fact accurate.

mephenor commented 4 years ago

Finding a good BioModels subset is a task in itself, so this should likely be done differently. Polishing one model with BiGGIds twice, once with the correct id and once with a scrambled variant should be a valid test for this functionality. Setting up a database for this testing procedure is currently the problem here, as discussed. This will be done after the beta release.

Schmoho commented 2 years ago

For species this seems to work as expected:

https://github.com/draeger-lab/ModelPolisher/blob/876bec77eeaccf8a8f4983fe735e7b9cf343a16f/src/test/java/edu/ucsd/sbrg/bigg/annotation/SpeciesAnnotationTest.java#L61-L88

Schmoho commented 2 years ago

For reactions it also kind of works like expected, however there is an issue with foreign IDs that map to more than one BiGG-ID: those are discarded.

https://github.com/draeger-lab/ModelPolisher/blob/62b6b210c2b0b643799121de5a78cbb11991dac4/src/test/java/edu/ucsd/sbrg/bigg/annotation/ReactionAnnotationTest.java#L21-L78

Schmoho commented 2 years ago

Running

select distinct r.bigg_id as reaction_bigg_id, c.bigg_id as compartment_bigg_id, c.name as compartment_name
from reaction_matrix rm, compartmentalized_component cc, compartment c, reaction r
where rm.reaction_id in (select ome_id
                      from synonym
                      where synonym ilike '%ACETATEKIN-RXN%')
           and rm.compartmentalized_component_id = cc.id
           and cc.compartment_id = c.id
           and rm.reaction_id = r.id;

yields

"reaction_bigg_id"  "compartment_bigg_id"   "compartment_name"
"ACKr"                  "c" "cytosol"
"ACKrh"                 "h"  "chloroplast"
"ACKrm"                 "m"   "mitochondria"
Schmoho commented 2 years ago

The offending code is here: https://github.com/draeger-lab/ModelPolisher/blob/62b6b210c2b0b643799121de5a78cbb11991dac4/src/main/java/edu/ucsd/sbrg/db/BiGGDB.java#L753-L758

Unfortunately this is somewhat deep in the stack and embedded in creative attempts at code deduplication.

getBiggIdFromParts:329, BiGGAnnotation (edu.ucsd.sbrg.bigg.annotation)
lambda$getBiGGIdFromResources$1:306, BiGGAnnotation (edu.ucsd.sbrg.bigg.annotation)
apply:-1, 28318221 (edu.ucsd.sbrg.bigg.annotation.BiGGAnnotation$$Lambda$607)
flatMap:294, Optional (java.util)
getBiGGIdFromResources:306, BiGGAnnotation (edu.ucsd.sbrg.bigg.annotation)
checkId:91, ReactionAnnotation (edu.ucsd.sbrg.bigg.annotation)
annotate:58, ReactionAnnotation (edu.ucsd.sbrg.bigg.annotation)
getBiGGIdFromResourcesTest:50, ReactionAnnotationTest (edu.ucsd.sbrg.bigg.annotation)
Schmoho commented 2 years ago

Last commit introduced a change to the reaction annotations. We now consider all potential reaction hits from foreign IDs and filter on matching compartment. I.e. even if a foreign ID (e.g. a kegg ID) is associated with multiple BiGG-IDs, we only discard those that don't match the compartment of the reaction. On the flip side, this will no longer annotate in case there is only a single hit but no matching compartment.

https://github.com/draeger-lab/ModelPolisher/blob/8e2b3e58411df4b4855c02255ae099595c4505f4/src/main/java/edu/ucsd/sbrg/db/BiGGDB.java#L725-L738