geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
43 stars 89 forks source link

Create a GO-CAM -> TSV file for end user consumption #2008

Open kltm opened 1 year ago

kltm commented 1 year ago

Create a GO-CAM -> TSV file for end user consumption. This could fill the ecological niche of our previous SIF effort.

TBD:

Tagging @dustine32 @balhoff

kltm commented 1 year ago

Also tagging @pgaudet

kltm commented 1 year ago

Noting for @balhoff, technically, not having the SPARQL endpoint or store involved (i.e. a file-to-file transformation) would likely be easier to reuse and recreate. (Dealing with the endpoint complicates things and the blazegraph is--or should be--a moving target right now.)

balhoff commented 1 year ago

@cmungall what are the desired columns? Something like this (using noted property path)?

Taking this model as an example: http://noctua.geneontology.org/editor/graph/gomodel:645d887900000758?

For hormone activity enabled by BGLAP, would we only include the brain development process, and not the other two which it's part of?

balhoff commented 1 year ago

Sample table for http://noctua.geneontology.org/editor/graph/gomodel:5ee8120100000524:

?gp1 ?mf1 ?cc1 ?bp1 ?relation ?gp2 ?mf2 ?cc2 ?bp2 ?model
http://identifiers.org/uniprot/O15049 http://purl.obolibrary.org/obo/GO_0005515 http://purl.obolibrary.org/obo/RO_0002629 http://identifiers.org/uniprot/P46934 http://purl.obolibrary.org/obo/GO_0061630 http://model.geneontology.org/5ee8120100000524
http://identifiers.org/uniprot/Q7Z434 http://purl.obolibrary.org/obo/GO_0035591 http://purl.obolibrary.org/obo/RO_0002629 http://identifiers.org/uniprot/Q12933 http://purl.obolibrary.org/obo/GO_0031625 http://model.geneontology.org/5ee8120100000524
http://identifiers.org/uniprot/P46934 http://purl.obolibrary.org/obo/GO_0061630 http://purl.obolibrary.org/obo/RO_0002629 http://identifiers.org/uniprot/Q7Z434 http://purl.obolibrary.org/obo/GO_0035591 http://model.geneontology.org/5ee8120100000524
balhoff commented 1 year ago

And for the BGLAP model mentioned above:

?gp1 ?mf1 ?cc1 ?bp1 ?relation ?gp2 ?mf2 ?cc2 ?bp2 ?model
http://identifiers.org/uniprot/P02818 http://purl.obolibrary.org/obo/GO_0005179 http://purl.obolibrary.org/obo/GO_0005576 http://purl.obolibrary.org/obo/GO_0007420 http://purl.obolibrary.org/obo/RO_0002413 http://identifiers.org/uniprot/Q5T848 http://purl.obolibrary.org/obo/GO_0004888 http://purl.obolibrary.org/obo/GO_0005886 http://purl.obolibrary.org/obo/GO_0007420 http://model.geneontology.org/645d887900000758
http://identifiers.org/uniprot/P38435 http://purl.obolibrary.org/obo/GO_0008488 http://purl.obolibrary.org/obo/GO_0017187 http://purl.obolibrary.org/obo/RO_0002630 http://identifiers.org/uniprot/P02818 http://purl.obolibrary.org/obo/GO_0005179 http://purl.obolibrary.org/obo/GO_0005576 http://purl.obolibrary.org/obo/GO_0007420 http://model.geneontology.org/645d887900000758
kltm commented 1 year ago

@balhoff Curious: would end users be wanting URIs or CURIEs? Guess it depends on the audience?

balhoff commented 1 year ago

@kltm just an information example for now. We can do curies in the end. But I want to check that I'm pulling out the right stuff.

cmungall commented 1 year ago

For hormone activity enabled by BGLAP, would we only include the brain development process, and not the other two which it's part of?

I think having a pipe separate list for this should be fine (it should always be a tree structure and hence the list can always be interpreted as a chain)

It may turn out that this is overkill and there is not so much information in nested part-ofs. Looking at that model, one of the paths is clearly wrong (unless neurotransmitters can think). But as a first pass, having this be transparent is a great way for us to easily spot check some of these

balhoff commented 1 year ago

@cmungall specifying the order of transitive part-ofs might require a different approach; I'm using SPARQL right now, nice and simple but property paths don't capture that. If we're okay with interpreting as a bag of relevant terms then we can use the property paths.

balhoff commented 1 year ago

With some BP and CC aggregation (unordered):

?gp1 ?mf1 ?cc1s ?bp1s ?relation ?gp2 ?mf2 ?cc2s ?bp2s ?model
UniProtKB:P02818 GO:0005179 GO:0005576 GO:0007420,GO:0050890,GO:0001956 RO:0002413 UniProtKB:Q5T848 GO:0004888 GO:0005886 GO:0007420,GO:0050890,GO:0001956 gomodel:645d887900000758
UniProtKB:P38435 GO:0008488 GO:0017187 RO:0002630 UniProtKB:P02818 GO:0005179 GO:0005576 GO:0007420,GO:0050890,GO:0001956 gomodel:645d887900000758