Include inferred gene product to term relationships as a product built by pipeline

geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.

http://geneontology.org

BSD 3-Clause "New" or "Revised" License

46 stars 89 forks source link

Include inferred gene product to term relationships as a product built by pipeline #650

Open cmungall opened 6 years ago

cmungall commented 6 years ago

This will essentially be triples, gene ID relation TermID

http://wiki.geneontology.org/index.php/Category:Gene_Product_to_Term_Relations

cc @balhoff

this should presumably just fall out of the blazegraph build, do we do arachne reasoning on a per-group level?

cmungall commented 6 years ago

OK, this is perhaps more nuanced than I thought, as Arachne is not a TBox reasoner.

e.g. given

g1 a MGI:123
g1 involved_in f1
f1 a GO:123
GO:123 SubClassOf part_of some GO:234

we want MGI:123 involved_in GO:234

cmungall commented 6 years ago

@balhoff would it be odd to shadow the tbox in the abox here? E.g. make 'prototypical' instances of each ontology class? Or should it be a two step process, use Arachne to inferred direct types, and then EMR after that?

balhoff commented 6 years ago

I have some exploratory code for Arachne that should be able to get that inference. It is quite a bit slower than normal but it might be good for the pipeline.

cmungall commented 6 years ago

re slower: This is potentially for 100s of thousands of genes.

Would the Arachne approach be complete w.r.t property chains? primarily involved-in o part-of

balhoff commented 6 years ago

@cmungall actually I think your shadowing idea might be the way to go. Now I see that you'd want to infer "existential types" even if there is no node in the model. Probably the shadowing will be safe as long as we remove inverse property rules. I'll make a start on some approaches.

pgaudet commented 3 years ago

Still relevant?

@vanaukenk

vanaukenk commented 3 years ago

@pgaudet Yes, I think we'd still like to harmonize our gene product-to-term relationships as much as possible and if we can use the ontology to help do that, we should. I'm not sure where this stands on the priority list, though. What do you think?