geneontology / neo

noctua entity ontology
9 stars 2 forks source link

NEO term URIs are different from Golr #17

Open balhoff opened 7 years ago

balhoff commented 7 years ago

In NEO, terms use OBO PURLs like http://purl.obolibrary.org/obo/MGI_MGI%3A1336172, but the autocomplete in Noctua produces terms like http://www.informatics.jax.org/accession/MGI:MGI:1336172 (which I assume comes from Golr). This is a problem for using NEO to get taxon metadata or to query models for instances of molecular entity.

cmungall commented 7 years ago

cc @kltm

Almost. Golr doesn't care much what goes in the id field. Currently the OWLTools loader will contract to OBO-style IDs (using a needs-to-die OWLTools-Core method), producing IDs like MGI:MGI:1336172 .

These get passed on Minerva, which expands them using it's CURIE map, giving the jax URL. But the URI in Minerva's TBox comes from Neo.

ugh

cmungall commented 7 years ago

OK, I think the correct fix is to adapt the OWLTools-core golr loader to use a prefixmap, but this is a bit of work.

A workaround is to make minerva aggressively defensive, distrust and rewrite its own TBox. Ugh.

balhoff commented 7 years ago

Does your comment mean that it is correct for these terms to use the OBO prefix within NEO?

cmungall commented 6 years ago

Once @yy20716 implements this: https://github.com/owlcollab/owltools/issues/245

I will have neo export the same URIs that are used by minerva

cmungall commented 6 years ago

Hmm, it looks like https://build.berkeleybop.org/job/build-noctua-entity-ontology/

is configured to take from any branch not master, which means my PR has already leaked out

   <!-- http://identifiers.org/flybase/FBgn0000003 -->

    <owl:Class rdf:about="http://identifiers.org/flybase/FBgn0000003">
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/CHEBI_33695"/>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002162"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_7227"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <oboInOwl:hasBroadSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">7SLRNACR32864</oboInOwl:hasBroadSynonym>
        <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Signal recognition particle 7SL RNA CR32864 Dmel</oboInOwl:hasExactSynonym>
        <oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">FB:FBgn0000003</oboInOwl:id>
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">7SLRNACR32864 Dmel</rdfs:label>
    </owl:Class>

We need to make sure that the latest owltools (one that incorporates https://github.com/owlcollab/owltools/pull/247) is used to load solr

in this job: https://build.berkeleybop.org/job/load-golr-noctua-neo

it seems this job failed for a different reason.

kltm commented 6 years ago

@cmungall I believe I've cleared the "different reason" on that machine.

balhoff commented 6 years ago

I'm seeing updated identifiers in NEO now. In particular, I notice identifiers.org for MGI and ZFIN. But, a couple of questions:

  1. MGI IDs look like this: http://identifiers.org/mgi/MGI%3A99582. Should they instead be http://identifiers.org/mgi/MGI:99582? As far as I know the colon doesn't need to be encoded. Although it resolves to the same place, it's not the same identifier.
  2. I seem some OBO-style ComplexPortal IDs: http://purl.obolibrary.org/obo/ComplexPortal_CPX-1. Is that correct?
cmungall commented 6 years ago

On 22 May 2018, at 11:19, Jim Balhoff wrote:

I'm seeing updated identifiers in NEO now. In particular, I notice identifiers.org for MGI and ZFIN. But, a couple of questions:

  1. MGI IDs look like this: http://identifiers.org/mgi/MGI%3A99582. Should they instead be http://identifiers.org/mgi/MGI:99582? As far as I know the colon doesn't need to be encoded. Although it resolves to the same place, it's not the same identifier.

yes, should use : directly

  1. I seem some OBO-style ComplexPortal IDs: http://purl.obolibrary.org/obo/ComplexPortal_CPX-1. Is that correct?

we just haven't gotten around to that one yet

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/geneontology/neo/issues/17#issuecomment-391031050

goodb commented 4 years ago

Noting that above comment on complex portal has not been resolved yet.
Minerva seems to think that ComplexPortal:CPX-998 means https://www.ebi.ac.uk/complexportal/complex/CPX-9 but NEO seems to think that it means http://purl.obolibrary.org/obo/ComplexPortal_CPX-998

cmungall commented 4 years ago

there is a really horrific perl hack (my fault) in the neo pipeline, we need to add this to the set of regexes, sorry...

On Thu, Jul 23, 2020 at 2:36 PM goodb notifications@github.com wrote:

Noting that above comment on complex portal has not been resolved yet. Minerva seems to think that ComplexPortal:CPX-998 means https://www.ebi.ac.uk/complexportal/complex/CPX-9 but NEO seems to think that it means http://purl.obolibrary.org/obo/ComplexPortal_CPX-998

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/17#issuecomment-663244407, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOPF4ADLEYUGOWUOJODR5CUL7ANCNFSM4DDOXZFQ .