biolink / ontobio

python library for working with ontologies and ontology associations
https://ontobio.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
123 stars 30 forks source link

Bridge GO/AmiGO and Monarch evidence models #126

Open cmungall opened 6 years ago

cmungall commented 6 years ago

would be good to have a bit more documentation on each (@kltm and @kshefchek respectively)

The Monarch concept of an evidence graph generalizes the GO GAF evidence model. The latter allows only one link in a chain. This is actually a frequent issue for GO, where we have long needed to represent chains of two or more pieces of evidence. This is also unsatisfactory when we collapse a GOCAM to a GPAD, we have to find the one link the curator finds most pertinent (cc @balhoff). If GPADs had supported chains from the beginning this would be easier. The Monarch concept of an evidence graph generalizes chains further allowing arbitrary graphs connecting a source/subject to an object/sink (though in practice many such chains are links of length 2).

The Monarch evidence graph is represented as a bbop graph, which is stored as a string in solr. For example this query

see the link between TBX5 and atrial fibrillation. This is actually inferred from the asserted graph below:

image

As can be seen there are 2 links in the chain of inference. Only one has evidence asserted (the link between the variant and the gene is taken as true here).

in addition to the full evidence graph, the monarch solr schema pattern has convenience fields that list all the nodes in the evidence graph


"evidence_object": [
"MONARCH:e95f810c91005264",
"PMID:28416818",
"dbSNP:rs883079",
"ECO:0000213",
"PMID:28416822",
"EFO:0000275",
"NCBIGene:6910"
],
"evidence_object_label": [
"rs883079-C",
"MONARCH:e95f810c91005264",
"atrial fibrillation",
"PMID:28416818",
"TBX5",
"PMID:28416822",
"combinatorial evidence used in automatic assertion"
],```

there isn't a convenience field specifically for the ECO class, this could be extracted formally from the graph using RO:0002558, or hackily by looking at the evidence_object list and taking the ECO: class. This would map to the single 'evidence' field using in AmiGO-GOlr.
kshefchek commented 6 years ago

there isn't a convenience field specifically for the ECO class

We store the ECO class, label, and closure each in a field in the monarch golr.