geneontology / neo

noctua entity ontology
9 stars 2 forks source link

NEO no longer builds in the pipeline #62

Closed kltm closed 3 years ago

kltm commented 3 years ago

Sometime between Nov 21st and Nov 27th, a change occurred in NEO (or something it brings in) that prevents the build with:

12:15:44  Exception in thread "main" org.semanticweb.owlapi.model.OWLOntologyStorageException: org.obolibrary.oboformat.model.FrameStructureException: multiple name tags not allowed. in frame:Frame(UniProtKB:Q06787-11 id( UniProtKB:Q06787-11)name( FMR1 Hsap)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProductIsoform)synonym( FMR1 RELATED)synonym( FMR1 BROAD)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/Protein)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine)name( Fmr1 isoform 11 Rnor)synonym( Q06787-11 RELATED)relationship( in_taxon NCBITaxon:9606)relationship( in_taxon NCBITaxon:10116)relationship( has_gene_template UniProtKB:Q06787)is_a( RGD:2623)is_a( CHEBI:36080))
12:15:44    at org.semanticweb.owlapi.oboformat.OBOFormatRenderer.render(OBOFormatRenderer.java:90)
12:15:44    at org.semanticweb.owlapi.oboformat.OBOFormatStorer.storeOntology(OBOFormatStorer.java:42)
12:15:44    at org.semanticweb.owlapi.util.AbstractOWLStorer.storeOntology(AbstractOWLStorer.java:155)
12:15:44    at org.semanticweb.owlapi.util.AbstractOWLStorer.storeOntology(AbstractOWLStorer.java:119)
12:15:44    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.saveOntology(OWLOntologyManagerImpl.java:1525)
12:15:44    at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.saveOntology(OWLOntologyManagerImpl.java:1502)
12:15:44    at owltools.io.ParserWrapper.saveOWL(ParserWrapper.java:289)
12:15:44    at owltools.io.ParserWrapper.saveOWL(ParserWrapper.java:209)
12:15:44    at owltools.cli.CommandRunner.runSingleIteration(CommandRunner.java:3712)
12:15:44    at owltools.cli.CommandRunnerBase.run(CommandRunnerBase.java:76)
12:15:44    at owltools.cli.CommandRunnerBase.run(CommandRunnerBase.java:68)
12:15:44    at owltools.cli.CommandLineInterface.main(CommandLineInterface.java:12)
12:15:44  Caused by: org.obolibrary.oboformat.model.FrameStructureException: multiple name tags not allowed. in frame:Frame(UniProtKB:Q06787-11 id( UniProtKB:Q06787-11)name( FMR1 Hsap)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProductIsoform)synonym( FMR1 RELATED)synonym( FMR1 BROAD)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/Protein)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine)name( Fmr1 isoform 11 Rnor)synonym( Q06787-11 RELATED)relationship( in_taxon NCBITaxon:9606)relationship( in_taxon NCBITaxon:10116)relationship( has_gene_template UniProtKB:Q06787)is_a( RGD:2623)is_a( CHEBI:36080))
12:15:44    at org.obolibrary.oboformat.model.Frame.checkMaxOneCardinality(Frame.java:424)
12:15:44    at org.obolibrary.oboformat.model.Frame.check(Frame.java:405)
12:15:44    at org.obolibrary.oboformat.model.OBODoc.check(OBODoc.java:390)
12:15:44    at org.obolibrary.oboformat.writer.OBOFormatWriter.write(OBOFormatWriter.java:183)
12:15:44    at org.semanticweb.owlapi.oboformat.OBOFormatRenderer.render(OBOFormatRenderer.java:88)
12:15:44    ... 11 more
12:15:44  Makefile:27: recipe for target 'neo.obo' failed
12:15:44  make: *** [neo.obo] Error 1

https://build.geneontology.org/job/geneontology/job/pipeline/job/issue-35-neo-test/97/console

Tagging @balhoff Notice to @vanaukenk

balhoff commented 3 years ago

@kltm could be a data issue. Here are the fields from the error message, formatted so easier to read:

frame:Frame(UniProtKB:Q06787-11 
id( UniProtKB:Q06787-11)
name( FMR1 Hsap)
name( Fmr1 isoform 11 Rnor)
synonym( Q06787-11 RELATED)
synonym( FMR1 RELATED)synonym( FMR1 BROAD)
property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProductIsoform)
property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/Protein)
property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine)
is_a( RGD:2623)
is_a( CHEBI:36080))
relationship( in_taxon NCBITaxon:9606)
relationship( in_taxon NCBITaxon:10116)
relationship( has_gene_template UniProtKB:Q06787)

This has two labels (human and rat) and in_taxon relations to both (not good). Do you know where this term is coming in from?

kltm commented 3 years ago

@balhoff Unfortunately, I don't have more detail--this is just coming out of the NEO Makefile target. If it's not in one of the "real" ontologies, that might mean it might be coming from the GAFs or GPIs that get converted? We could start grepping around to see what comes up...

balhoff commented 3 years ago

In the NEO build...

neo-goa_human_isoform.obo contains:

[Term]
id: UniProtKB:Q06787-11
name: FMR1 Hsap
synonym: "FMR1" BROAD []
synonym: "FMR1" RELATED []
synonym: "Q06787-11" RELATED []
is_a: CHEBI:36080 ! protein
relationship: in_taxon NCBITaxon:9606
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/Protein
relationship: has_gene_template UniProtKB:Q06787

which I think comes from goa_human_isoform.gpi.gz:

UniProtKB   Q06787-11   FMR1    Synaptic functional regulator FMR1  FMR1    protein taxon:9606  UniProtKB:Q06787        db_subset=Swiss-Prot

neo-rgd.obo contains:

[Term]
id: UniProtKB:Q06787-11
name: Fmr1 isoform 11 Rnor
is_a: RGD:2623
relationship: in_taxon NCBITaxon:10116
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine
property_value: https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProductIsoform

which I think comes from gene_association.rgd.gz:

RGD 2623    Fmr1        GO:0005634  RGD:1624291 ISO RGD:735919  C   FMRP translational regulator 1      gene    taxon:10116 20180228    RGD     UniProtKB:Q06787-11
kltm commented 3 years ago

@balhoff Thank you for tracking it down.

Well, that's not good. I guess there are a few things to be done.

@pgaudet @vanaukenk For contacting the upstreams, how long might it take to resolve this?

vanaukenk commented 3 years ago

I'd advocate options 2 and 3.

But looking at the RGD GAF line, I'm not sure what the UniProtKB:Q06787-11 is supposed to mean (is that in Column 17?) as that is a human gene.

vanaukenk commented 3 years ago

I don't see the UniProtKB:Q06787-11 entry in the current RGD GAF, though.

balhoff commented 3 years ago

I don't see the UniProtKB:Q06787-11 entry in the current RGD GAF, though.

Strange—I found that this morning in https://raw.githubusercontent.com/rat-genome-database/rgd-annotation-files/master/gene_association.rgd.gz (that's the location used by the NEO makefile)

vanaukenk commented 3 years ago

I was looking at what is available on geneontology.org for annotation downloads.

kltm commented 3 years ago

If it looks like it's fixed, we can do another test later this week (we have the release in the hopper right now) and see what happens.

kltm commented 3 years ago

@balhoff @vanaukenk Apparently it's not fixed. While that particular instance may have gone away, there are others lurking:

12:15:31  Caused by: org.obolibrary.oboformat.model.FrameStructureException: multiple name tags not allowed. in frame:Frame(UniProtKB:Q06787-10 id( UniProtKB:Q06787-10)synonym( FMR1 RELATED)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/GeneProductIsoform)name( FMR1 Hsap)synonym( FMR1 BROAD)synonym( Q06787-10 RELATED)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/Protein)property_value( https://w3id.org/biolink/vocab/category https://w3id.org/biolink/vocab/MacromolecularMachine)name( Fmr1 isoform 10 Rnor)relationship( in_taxon NCBITaxon:9606)relationship( in_taxon NCBITaxon:10116)relationship( has_gene_template UniProtKB:Q06787)is_a( RGD:2623)is_a( CHEBI:36080))
12:15:31    at org.obolibrary.oboformat.model.Frame.checkMaxOneCardinality(Frame.java:424)
12:15:31    at org.obolibrary.oboformat.model.Frame.check(Frame.java:405)
12:15:31    at org.obolibrary.oboformat.model.OBODoc.check(OBODoc.java:390)
12:15:31    at org.obolibrary.oboformat.writer.OBOFormatWriter.write(OBOFormatWriter.java:183)
12:15:31    at org.semanticweb.owlapi.oboformat.OBOFormatRenderer.render(OBOFormatRenderer.java:88)
12:15:31    ... 11 more
12:15:31  Makefile:27: recipe for target 'neo.obo' failed
12:15:31  make: *** [neo.obo] Error 1

It looks like something more general will need to be done with either the upstream(s?) or pipeline work will need to start to remove duplicates (2 or 3 above).

balhoff commented 3 years ago

Two questions:

I guess that was three questions.

vanaukenk commented 3 years ago

@balhoff @kltm

I think we'll have to ask RGD what the intended meaning of the human isoform is in column 17. It isn't correct for the GAF, though, because any ID in Column 17 should be the same species as the ID in Column 2.

I don't think NEO is wrong to generate an entry from a Column 17 ID.

pgaudet commented 3 years ago

Tagging @jrsjrs from RGD so she can weigh in.

kltm commented 3 years ago

@vanaukenk It looks like this is working again? Was there a possible upstream fix?

vanaukenk commented 3 years ago

2021-02-16 correspondence with RGD, the issue with human identifiers in Col. 17 of the RGD GAF has been addressed.