geneontology / neo

noctua entity ontology
9 stars 2 forks source link

Unique Recommended Name for a GP #32

Closed lpalbou closed 6 years ago

lpalbou commented 6 years ago

@cmungall I mentioned during the hackathon that some GPs have several recommended names (rdfs:label), which should not be the case (at least given the same language), since we have synonyms (oboInOwl:hasExact/BroadSynonym) for that.

Example from RGD (NEO metadata generated during GAF conversion): SELECT * WHERE { <http://identifiers.org/rgd/1304707> rdfs:label ?label } -> has Lrfn1 Rnor and Lrfn1

Example from MGI (NEO metadata generated using GPI): SELECT * WHERE { <http://identifiers.org/mgi/MGI:3588192> rdfs:label ?label } -> has 3 rdfs:label (Rtl4 Mmus, Rtl4, zcchc16 Mmus)

In the case of this MGI, the GPI file indicates Rtl4 for the name, and other things are synonyms: MGI MGI:3588192 Rtl4 retrotransposon Gag like 4 C230031A03Rik|Mar4|Zcchc16 gene taxon:10090 UniProtKB:Q3URY0

Fixing that will ensure that we retrieve a single (and correct) recommended name for each GP as for the moment it's not certain.

cmungall commented 6 years ago

are you sure this is in neo? cc @dougli1sqrd

lpalbou commented 6 years ago

I thought all meta data about GPs are coming from NEO, if not, I don't know.

@dougli1sqrd or @balhoff any feedback on that ?

And it's also strange because between last month (July) release and the current snapshot (rdf endpoint not yet released in production), I don't even have the same number of labels per GP...

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
  <http://identifiers.org/zfin/ZDB-GENE-060222-1> rdfs:label ?label
}

This query on the current rdf.geneontology.org gives returns 1 rdfs:label (ptger4a Drer) but on the current snapshot/blazegraph, the same query gives two rdfs:label (ptger4a Drer and ptger4a) ... !

And same thing, on current ZFIN GPI the line for that gene is:

ZFIN    ZDB-GENE-060222-1   ptger4a prostaglandin E receptor 4 (subtype EP4) a  ep4|ep4a|PRS4|ptger4l   gene    taxon:7955      InterPro:IPR000276|ENSEMBL:ENSDARG00000059236|UniProtKB:A0PJP9|Pfam:PF00001|InterPro:IPR001244|UniProtKB:Q2N1D9|NCBIGene:562469|Alliance:ZDB-GENE-060222-1|PANTHER:PTHR11866|UniGene:52834|PROSITE:PS50262|InterPro:IPR001758|Ensembl:ENSDARP00000092384|UniProtKB:F1Q955|InterPro:IPR008365|RefSeq:NP_001034718|GenBank:DQ202321|GenPept:AAI27560|GenPept:ABB16281|RefSeq:NM_001039629|GenBank:BC127559|InterPro:IPR017452|PROSITE:PS00237 

So we should only have 1 recommended name, ptger4a. I am indeed starting to wonder @kltm are you adding at some point the recommended name + species as an rdfs:label too ?

PS: it's really starting to be a problem for the gocam site and the SPARQL queries.

balhoff commented 6 years ago

The extra label for Lrfn1 is coming from a model with IRI http://model.geneontology.org/12a1456c-4a4a-4fef-88f7-6ea9e390140b/: http://yasgui.org/short/SyOOQz1LQ

I can't find this model in Noctua—where did it come from?

I see only two labels for Rtl4: http://yasgui.org/short/rydsVzyUX

It seems like the extra label is in the model, however I don't see that in Noctua. It's possible this is in a "lego:derived" (can't remember the exact term) annotation, which is a holdover from earlier functionality and is supposed to be automatically removed when the models are dumped and reloaded.

So these two don't seem to be NEO problems.

balhoff commented 6 years ago

@lpalbou perhaps you should just confine your label matches to come from GRAPH <http://purl.obolibrary.org/obo/go/extensions/go-graphstore.owl>.

lpalbou commented 6 years ago

@balhoff thanks Jim, I have updated my SPARQL queries (https://github.com/geneontology/api-gorest/commit/4f00e0b2f085b3098fc5d85e975545e87c64a850) and it seems to have solved the issue.