geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
46 stars 89 forks source link

Use a single JSON-LD context across all of GO, and make any necessary changes to noctua-models #617

Open cmungall opened 6 years ago

cmungall commented 6 years ago

Parts of our stack (amigo, noctua-js, GAFs, etc) use CURIEs/IDs as currency. Other parts (minerva, go-rdf, ontology) use URIs.

The expansion/contraction rules are not well defined.

We should have a single json-ld context file we use across the GO.

Furthermore, the contexts of this should be as predictable as possible. E.g. obolibrary for all ontologies, purl.uniprot for all uniprot entries, and something like id.org for everything else. This will require a one-time change to Noctua models.

Previous tickets:

balhoff commented 6 years ago

Would be nice to reuse the OBO prefixes context: http://obofoundry.org/registry/obo_context.jsonld

As far as I know, while a JSON document can reference multiple contexts, a context can't import another context. Should "single JSON-LD context" mean a single defined set of JSON-LD contexts, or do you want to have the pipeline concatenate a few source contexts into the single JSON-LD context?

cmungall commented 6 years ago

I'm adding rdf_uri_prefix to db-xrefs yaml. Note this will often be different from the web page expansion. Currently these are all obolibrary or identifiers.org.

db-xrefs.yaml is the canonical source metadata for GO. We will generate a json-ld context from this as part of the release. Minerva will use this for expansion/contraction when communicating with Noctua/golr. ontobio will use this when converting GAFs to GO-CAMs. The neo build will use this to expand GPIs to make an OWL file of all the gene products.

Jim: currently there is only a handful of ontologies in here and these are just manually synced with obo_context. We have tools in the prefixcommons repo to detect inconsistencies between these.

cmungall commented 6 years ago

Remaining issues:

TomConlin commented 6 years ago

Dipper's curie_prefix to base_iri mapping file is:

https://github.com/monarch-initiative/dipper/blob/master/dipper/curie_map.yaml

Monarch app should also use it although I am not sure it does everywhere it could.

curie_map.yaml could also stand a shakedown for

cmungall commented 6 years ago

Thanks Tom!

Summary of where we are in GO

https://github.com/geneontology/go-site/blob/master/metadata/db-xrefs.yaml is the source authority. See https://github.com/geneontology/go-site/pull/620/files

This is used to generate https://github.com/prefixcommons/biocontext/blob/master/registry/go_context.jsonld, but we'll actually publish the jsonld context as part of the GO pipeline.

The prefixcommons repo is a good place to go for getting diffs between any two contexts

TomConlin commented 6 years ago

Oh what a mess this is, prefix case differences, conflicting cases for uris, straight up prefix hijacking ... I'm sorry but I cannot not be taking this on right now.

lpalbou commented 6 years ago

This is a blocking issue for me on the GO-CAM site. For reference:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT * WHERE {

  #BIND(<http://identifiers.org/uniprot/Q9WTW1> as ?GP) .
  BIND(<http://identifiers.org/uniprot/P34913> as ?GP) .

  ?GP ?pred ?obj .
} 
LIMIT 10

Q9WTW1 (Rat) will have no information, just stating it is an owl:class P34913 (Human) will have some information (obo:id, rdfs:label)

Other cases:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {

#  BIND(<http://identifiers.org/uniprot/A8IV67> as ?gpuri)    # has nothing
#  BIND(<http://identifiers.org/uniprot/P10499> as ?gpuri)    # just has ?obj = owl:Class
#  BIND(<http://www.informatics.jax.org/accession/MGI:MGI:1316740> as ?gpuri)  # has owl:Class, oboInOwl:id, rdf:type, rdfs:label

  BIND(<http://identifiers.org/uniprot/P34913> as ?gpuri)     # has possibly all information (dbxref, synonym, label, subclassOf, etc)

  ?gpuri ?pred ?obj .
} 
LIMIT 10

Which affects more complex queries (e.g. to get the recommended name of a gene, or its taxon):

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX metago: <http://model.geneontology.org/>

PREFIX enabled_by: <http://purl.obolibrary.org/obo/RO_0002333>
PREFIX in_taxon: <http://purl.obolibrary.org/obo/RO_0002162>

SELECT distinct ?identifier ?name ?species

WHERE 
{
#  GRAPH metago:586fc17a00000705 {
  GRAPH metago:581e072c00000295 {
    ?s enabled_by: ?gpnode .    
    ?gpnode rdf:type ?identifier .
    FILTER(?identifier != owl:NamedIndividual) .         
  }

  ?identifier rdfs:subClassOf ?v0 . 
  ?identifier rdfs:label ?name .

  ?v0 owl:onProperty in_taxon: . 
  ?v0 owl:someValuesFrom ?taxon .
  ?taxon rdfs:label ?species .      
}

this query works for the second model, but does not work for the first model (xxx705). In the first model, the ?identifier is referring to a flat class without any subclass ?v0

balhoff commented 6 years ago

@lpalbou I don't think your problem relates to identifier prefixes. Q9WTW1 is simply not in NEO at all.

kltm commented 6 years ago

@cmungall @balhoff I believe that this is clear now?

lpalbou commented 6 years ago

@cmungall thanks !