geneontology / amigo

AmiGO is the public interface for the Gene Ontology.
http://amigo.geneontology.org
BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

Some AspGD taxon data appears without a label #570

Open ValWood opened 5 years ago

ValWood commented 5 years ago

there is something amiss with the way AspDB annotations appear in AmiGO.

Tha taxon isn't parsed correctly so they are not available in the organsim filter, E.g.

http://amigo.geneontology.org/amigo/gene_product/AspGD:Aspfo1_0204585

kltm commented 5 years ago

@ValWood That's a good catch--thank you. I'll look into this today.

kltm commented 5 years ago

Examining similarly formed entries from the GAF, this was not a uniform problem: http://amigo.geneontology.org/amigo/gene_product/AspGD:Aspka1_0181639

To note, the issue here seems to be that in some cases the taxon ID does not seem to get resolved to a label, which means that the main taxon entry is left "blank" and is left as an ID in the table.

As this is a relatively new annotation done on the day of the release, I wonder if somehow the ncbi taxon ontology could have been out of sync with the annotations, leading to a case where the label went AWOL.

kltm commented 5 years ago

Partially bum theory as 2019-06-23 http://amigo-exp.geneontology.io/amigo/gene_product/AspGD:Aspfo1_0204585 still has the information gap.

ValWood commented 5 years ago

It might be something to do with taxon strain IDs vs strain IDs (some species have strain IDs in NCBI). I'm not completely sure what these particular IDs are but it's a possibility.

@marekskrzypek might be able to enlighten you?
ValWood commented 5 years ago

It isn't restricted to AspDB

http://amigo.geneontology.org/amigo/gene_product/CGD:CORT_0G01250

kltm commented 5 years ago

@ValWood It seems to be the same taxon though: NCBITaxon:1136231 , which is a good thing. I do not think the problem resides in the GAF, rather likely in loader or the NCBITaxon file that we load.

kltm commented 5 years ago

Noting from load log:

[2019-06-10T12:09:23.763Z] 2019-06-10 12:09:23,648 INFO  (GafSolrDocumentLoader:
189) Skipping taxon closures for unknown id: NCBITaxon:1136231

That's owltools, around

        final OWLClass taxCls = graph.getOWLClassByIdentifier(taxonId);

within bioentity solr document assembly. That would seem like an issue at the ontology then. @balhoff Would you be able to officially confirm the presence or not of NCBITaxon:1136231 in "http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl" ? Grepping shows that it is not there. If not, what are the channels to add it?

kltm commented 5 years ago

@cmungall I believe that you originally made the taxslim ontology? What would be the procedure for getting something in there? Re: https://github.com/geneontology/amigo/issues/570#issuecomment-505678832

balhoff commented 5 years ago

@cmungall what is the origin of taxslim? Should we just expand GO ncbitaxon_import as needed and extract with ROBOT? Could keep a seed file in addition to the taxa directly referenced in the ontology.

pgaudet commented 5 years ago

there is something amiss with the way AspDB annotations appear in AmiGO.

Those were always like this. Only the ones coming from UniProt had the correct gene label.

Maybe this is not related but UniProt and AspDB and CGD weren't using the same tax id (although they were technically describing the same species).

Pascale

ValWood commented 5 years ago

Does it need fixing upstream? Who do we tag?

pgaudet commented 5 years ago

@marekskrzypek