Closed kltm closed 3 years ago
@kltm could be a data issue. Here are the fields from the error message, formatted so easier to read:
frame:Frame(UniProtKB:Q06787-11
id( UniProtKB:Q06787-11)
name( FMR1 Hsap)
name( Fmr1 isoform 11 Rnor)
synonym( Q06787-11 RELATED)
synonym( FMR1 RELATED)synonym( FMR1 BROAD)
property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProductIsoform)
property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/Protein)
property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine)
is_a( RGD:2623)
is_a( CHEBI:36080))
relationship( in_taxon NCBITaxon:9606)
relationship( in_taxon NCBITaxon:10116)
relationship( has_gene_template UniProtKB:Q06787)
This has two labels (human and rat) and in_taxon relations to both (not good). Do you know where this term is coming in from?
@balhoff Unfortunately, I don't have more detail--this is just coming out of the NEO Makefile target. If it's not in one of the "real" ontologies, that might mean it might be coming from the GAFs or GPIs that get converted? We could start grepping around to see what comes up...
In the NEO build...
neo-goa_human_isoform.obo
contains:
[Term]
id: UniProtKB:Q06787-11
name: FMR1 Hsap
synonym: "FMR1" BROAD []
synonym: "FMR1" RELATED []
synonym: "Q06787-11" RELATED []
is_a: CHEBI:36080 ! protein
relationship: in_taxon NCBITaxon:9606
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/Protein
relationship: has_gene_template UniProtKB:Q06787
which I think comes from goa_human_isoform.gpi.gz
:
UniProtKB Q06787-11 FMR1 Synaptic functional regulator FMR1 FMR1 protein taxon:9606 UniProtKB:Q06787 db_subset=Swiss-Prot
neo-rgd.obo
contains:
[Term]
id: UniProtKB:Q06787-11
name: Fmr1 isoform 11 Rnor
is_a: RGD:2623
relationship: in_taxon NCBITaxon:10116
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProductIsoform
which I think comes from gene_association.rgd.gz
:
RGD 2623 Fmr1 GO:0005634 RGD:1624291 ISO RGD:735919 C FMRP translational regulator 1 gene taxon:10116 20180228 RGD UniProtKB:Q06787-11
@balhoff Thank you for tracking it down.
Well, that's not good. I guess there are a few things to be done.
@pgaudet @vanaukenk For contacting the upstreams, how long might it take to resolve this?
I'd advocate options 2 and 3.
But looking at the RGD GAF line, I'm not sure what the UniProtKB:Q06787-11 is supposed to mean (is that in Column 17?) as that is a human gene.
I don't see the UniProtKB:Q06787-11 entry in the current RGD GAF, though.
I don't see the UniProtKB:Q06787-11 entry in the current RGD GAF, though.
Strange—I found that this morning in https://raw.githubusercontent.com/rat-genome-database/rgd-annotation-files/master/gene_association.rgd.gz (that's the location used by the NEO makefile)
I was looking at what is available on geneontology.org for annotation downloads.
If it looks like it's fixed, we can do another test later this week (we have the release in the hopper right now) and see what happens.
@balhoff @vanaukenk Apparently it's not fixed. While that particular instance may have gone away, there are others lurking:
12:15:31 Caused by: org.obolibrary.oboformat.model.FrameStructureException: multiple name tags not allowed. in frame:Frame(UniProtKB:Q06787-10 id( UniProtKB:Q06787-10)synonym( FMR1 RELATED)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProductIsoform)name( FMR1 Hsap)synonym( FMR1 BROAD)synonym( Q06787-10 RELATED)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/Protein)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine)name( Fmr1 isoform 10 Rnor)relationship( in_taxon NCBITaxon:9606)relationship( in_taxon NCBITaxon:10116)relationship( has_gene_template UniProtKB:Q06787)is_a( RGD:2623)is_a( CHEBI:36080))
12:15:31 at org.obolibrary.oboformat.model.Frame.checkMaxOneCardinality(Frame.java:424)
12:15:31 at org.obolibrary.oboformat.model.Frame.check(Frame.java:405)
12:15:31 at org.obolibrary.oboformat.model.OBODoc.check(OBODoc.java:390)
12:15:31 at org.obolibrary.oboformat.writer.OBOFormatWriter.write(OBOFormatWriter.java:183)
12:15:31 at org.semanticweb.owlapi.oboformat.OBOFormatRenderer.render(OBOFormatRenderer.java:88)
12:15:31 ... 11 more
12:15:31 Makefile:27: recipe for target 'neo.obo' failed
12:15:31 make: *** [neo.obo] Error 1
It looks like something more general will need to be done with either the upstream(s?) or pipeline work will need to start to remove duplicates (2 or 3 above).
Two questions:
I guess that was three questions.
@balhoff @kltm
I think we'll have to ask RGD what the intended meaning of the human isoform is in column 17. It isn't correct for the GAF, though, because any ID in Column 17 should be the same species as the ID in Column 2.
I don't think NEO is wrong to generate an entry from a Column 17 ID.
Tagging @jrsjrs from RGD so she can weigh in.
@vanaukenk It looks like this is working again? Was there a possible upstream fix?
2021-02-16 correspondence with RGD, the issue with human identifiers in Col. 17 of the RGD GAF has been addressed.
Sometime between Nov 21st and Nov 27th, a change occurred in NEO (or something it brings in) that prevents the build with:
https://build.geneontology.org/job/geneontology/job/pipeline/job/issue-35-neo-test/97/console
Tagging @balhoff Notice to @vanaukenk