Closed draeger closed 1 month ago
While this has been implemented during GSoC19, proper testing of the feature has not taken place yet. As discussed some models containing annotations from BioModels could be used for initial manual testing and converted into test cases later on, after validating that 1) additional annotations are obtained and 2) those annotations are in fact accurate.
Finding a good BioModels subset is a task in itself, so this should likely be done differently. Polishing one model with BiGGIds twice, once with the correct id and once with a scrambled variant should be a valid test for this functionality. Setting up a database for this testing procedure is currently the problem here, as discussed. This will be done after the beta release.
For species this seems to work as expected:
For reactions it also kind of works like expected, however there is an issue with foreign IDs that map to more than one BiGG-ID: those are discarded.
Running
select distinct r.bigg_id as reaction_bigg_id, c.bigg_id as compartment_bigg_id, c.name as compartment_name
from reaction_matrix rm, compartmentalized_component cc, compartment c, reaction r
where rm.reaction_id in (select ome_id
from synonym
where synonym ilike '%ACETATEKIN-RXN%')
and rm.compartmentalized_component_id = cc.id
and cc.compartment_id = c.id
and rm.reaction_id = r.id;
yields
"reaction_bigg_id" "compartment_bigg_id" "compartment_name"
"ACKr" "c" "cytosol"
"ACKrh" "h" "chloroplast"
"ACKrm" "m" "mitochondria"
The offending code is here: https://github.com/draeger-lab/ModelPolisher/blob/62b6b210c2b0b643799121de5a78cbb11991dac4/src/main/java/edu/ucsd/sbrg/db/BiGGDB.java#L753-L758
Unfortunately this is somewhat deep in the stack and embedded in creative attempts at code deduplication.
getBiggIdFromParts:329, BiGGAnnotation (edu.ucsd.sbrg.bigg.annotation)
lambda$getBiGGIdFromResources$1:306, BiGGAnnotation (edu.ucsd.sbrg.bigg.annotation)
apply:-1, 28318221 (edu.ucsd.sbrg.bigg.annotation.BiGGAnnotation$$Lambda$607)
flatMap:294, Optional (java.util)
getBiGGIdFromResources:306, BiGGAnnotation (edu.ucsd.sbrg.bigg.annotation)
checkId:91, ReactionAnnotation (edu.ucsd.sbrg.bigg.annotation)
annotate:58, ReactionAnnotation (edu.ucsd.sbrg.bigg.annotation)
getBiGGIdFromResourcesTest:50, ReactionAnnotationTest (edu.ucsd.sbrg.bigg.annotation)
Last commit introduced a change to the reaction annotations. We now consider all potential reaction hits from foreign IDs and filter on matching compartment. I.e. even if a foreign ID (e.g. a kegg ID) is associated with multiple BiGG-IDs, we only discard those that don't match the compartment of the reaction. On the flip side, this will no longer annotate in case there is only a single hit but no matching compartment.
Suggested enhancement by @tpfau: Look over all present annotations and map every annotation that can be mapped to BiGG. For instance, if there is a KEGG compound annotation that compound will be assigned its corresponding BiGG
id
along with all other annotations available in BiGG. Especially since that annotation data is already present in the BiGG Models Database, this would make ModelPolisher much more useful.As long as ModelPolisher only relies on BiGG
id
s as an input this will always require manual matching of the originalid
used to BiGGid
s or assume that the model originally used BiGGid
s. It would be much better to make it database dependent.