geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

Changes to ontology (possibly around Rhea xrefs or riot) prevent full ontology build and block other pipeline builds #28404

Open kltm opened 3 months ago

kltm commented 3 months ago

Between 8am and 4pm PT, on July 4th, the ontology build started to fail, bringing down or blocking multiple pipelines. Looking at the PRs in that window, I suspect the changes happened earlier, but took a couple of runs to sync into the build.

While the fatal build step seems to be:

11:39:45  INVALID ONTOLOGY FILE ERROR Could not load a valid ontology from file: enhanced.owl
11:39:45  For details see: http://robot.obolibrary.org/errors#invalid-ontology-file-error
  11:39:45  Use the -vvv option to show the a trace.
11:39:45  Use the --help option to see usage information.
11:39:45  
11:39:45  real  0m8.802s
11:39:45  user  0m48.590s
11:39:45  sys   0m3.159s
11:39:45  make: *** [Makefile:202: reasoned.owl] Error 1

The most suspicious part leading up to that seems to be this set of Rhea identifier errors:

11:38:20  2024.07.05 18:38:17 [WARN] ammonite.$file.$up.util.filter$minusrhea$minusxrefs.xrefsToRemove:69:16 - No such Rhea identifier, filtering xref: RHEA:80287
11:38:20  2024.07.05 18:38:17 [WARN] ammonite.$file.$up.util.filter$minusrhea$minusxrefs.xrefsToRemove:69:16 - No such Rhea identifier, filtering xref: RHEA:79423
11:38:20  2024.07.05 18:38:17 [WARN] ammonite.$file.$up.util.filter$minusrhea$minusxrefs.xrefsToRemove:69:16 - No such Rhea identifier, filtering xref: RHEA:80271
11:38:20  2024.07.05 18:38:17 [WARN] ammonite.$file.$up.util.filter$minusrhea$minusxrefs.xrefsToRemove:69:16 - No such Rhea identifier, filtering xref: RHEA:80271
11:38:20  2024.07.05 18:38:17 [WARN] ammonite.$file.$up.util.filter$minusrhea$minusxrefs.$anonfun.applyOrElse:82:20 - No such Rhea identifier, filtering definition xref: RHEA:80287
11:38:20  2024.07.05 18:38:17 [WARN] ammonite.$file.$up.util.filter$minusrhea$minusxrefs.$anonfun.applyOrElse:82:20 - No such Rhea identifier, filtering definition xref: RHEA:79423
11:38:20  2024.07.05 18:38:17 [WARN] ammonite.$file.$up.util.filter$minusrhea$minusxrefs.$anonfun.applyOrElse:82:20 - No such Rhea identifier, filtering definition xref: RHEA:80271
11:38:20  2024.07.05 18:38:17 [ERROR] ammonite.$file.$up.util.filter$minusrhea$minusxrefs.main:97:46 - Obsolete Rhea ID used in xref: RHEA:67620
11:38:20  2024.07.05 18:38:17 [ERROR] ammonite.$file.$up.util.filter$minusrhea$minusxrefs.main:97:46 - Obsolete Rhea ID used in xref: RHEA:67624

I would also note the following oddity:

11:36:32  riot -q --nocheck --output ntriples go-edit.facts.ttl | sed 's/ /\t/' | sed 's/ /\t/' | sed 's/ \.$//' >go-edit.facts
11:37:10  18:37:04 WARN  riot            :: [line: 349, col: 1 ] Bad IRI: <http://www.geneontology.org/formats/oboInOwl#http://purl.obolibrary.org/obo/go#source> Code: 0/ILLEGAL_CHARACTER in FRAGMENT: The character violates the grammar rules for URIs/IRIs.
11:37:10  18:37:05 WARN  riot            :: [line: 403735, col: 4 ] Bad IRI: <http://www.geneontology.org/formats/oboInOwl#http://purl.obolibrary.org/obo/go#source> Code: 0/ILLEGAL_CHARACTER in FRAGMENT: The character violates the grammar rules for URIs/IRIs.
11:37:10  18:37:06 WARN  riot            :: [line: 526884, col: 4 ] Bad IRI: <http://www.geneontology.org/formats/oboInOwl#http://purl.obolibrary.org/obo/go#source> Code: 0/ILLEGAL_CHARACTER in FRAGMENT: The character violates the grammar rules for URIs/IRIs.

Ideally, this issue should only be closed when both of these criteria are met:

Tagging @pgaudet @balhoff @sjm41

sjm41 commented 3 months ago

I'm not sure the RHEA errors/warnings are the root cause of the issue here, but here's a quick report on the offending IDs:

RHEA:80287 RHEA:80271 RHEA:79423

From looking at the associated tickets, these three RHEAs are in the RHEA internal DB but not yet in the public file - they all related to recently created GO terms:

id: GO:0141208 name: protein lysine delactylase activity namespace: molecular_function def: "Catalysis of the reaction: H2O + N6-lactoyl-L-lysyl-[protein] + NAD = L-lysyl-[protein] + nicotinamide +2''-O-lactoyl-ADP-D-ribose, removing a lactoyl group attached to a lysine residue in a protein." [PMID:38512451, RHEA:80287] xref: RHEA:80287 is_a: GO:0033558 ! protein lysine deacetylase activity property_value: term_tracker_item "https://github.com/geneontology/go-ontology/issues/28015" xsd:anyURI

id: GO:0141207 name: peptide lactyltransferase (ATP-dependent) activity namespace: molecular_function def: "Catalysis of the reaction: lactate + ATP + L-lysyl-[protein] = N(6)-lactoyl-L-lysyl-[protein]+ AMP + diphosphate. Can also act on free lactate." [PMID:38512451, PMID:38653238, RHEA:80271] synonym: "peptide lactyltransferase (ATP dependent) activity" EXACT [] synonym: "peptide lactyltransferase activity" BROAD [] xref: RHEA:80271 {source="skos:exactMatch"} xref: RHEA:80271 {comment="skos:narrowMatch"} is_a: GO:0016886 ! ligase activity, forming phosphoric ester bonds is_a: GO:0140096 ! catalytic activity, acting on a protein property_value: term_tracker_item "https://github.com/geneontology/go-ontology/issues/28015" xsd:anyURI

id: GO:0141200 name: UTP thiamine diphosphokinase activity namespace: molecular_function def: "Catalysis of the reaction: UTP + thiamine = UMP + thiamine diphosphate." [PMID:38547260, RHEA:79423] xref: RHEA:79423 is_a: GO:0016778 ! diphosphotransferase activity property_value: term_tracker_item "https://github.com/geneontology/go-ontology/issues/27518" xsd:anyURI created_by: pg

I see there's an additional problem with GO:0141207, where the second line needs deleting: xref: RHEA:80271 {source="skos:exactMatch"} xref: RHEA:80271 {comment="skos:narrowMatch"}


Obsolete Rhea ID used in xref: RHEA:67620

This has been replaced with RHEA:78471 and RHEA:78475

Obsolete Rhea ID used in xref: RHEA:67624

And this has been replaced with RHEA:78479 and RHEA:78507

That might mean that all four new RHEAs should be narrowMatch xref on the associated GO term, but I haven't checked:

id: GO:0071164 name: RNA cap trimethylguanosine synthase activity namespace: molecular_function def: "Catalysis of two successive methyl transfer reactions from AdoMet to the N-2 atom of guanosine, thereby converting 7-methylguanosine in an RNA cap to 2,2,7 trimethylguanosine." [PMID:11983179, PMID:18775984] comment: A 2,2,7-trimethylguanosine (TMG) cap is found on many RNA polymerase II transcribed small noncoding RNAs including small nuclear RNA (snRNA), small nucleolar RNA (snoRNA) and telomerase RNA. It is also found on nematode mRNAs that undergo trans-splicing of a 5'-capped leader sequence. synonym: "cap hypermethylase activity" EXACT [PMID:11983179] synonym: "RNA trimethylguanosine synthase activity" EXACT [] synonym: "small nuclear RNA methyltransferase activity" RELATED [GOC:rl] synonym: "snRNA methyltransferase activity" RELATED [GOC:rl] xref: RHEA:67620 {source="skos:narrowMatch"} xref: RHEA:67624 {source="skos:narrowMatch"} is_a: GO:0008173 ! RNA methyltransferase activity is_a: GO:0008757 ! S-adenosylmethionine-dependent methyltransferase activity relationship: part_of GO:0036261 ! 7-methylguanosine cap hypermethylation property_value: term_tracker_item "https://github.com/geneontology/go-ontology/issues/25717" xsd:anyURI property_value: term_tracker_item "https://github.com/geneontology/go-ontology/issues/26934" xsd:anyURI created_by: mah creation_date: 2009-11-19T03:23:20Z

pgaudet commented 2 months ago

From looking at the associated tickets, these three RHEAs are in the RHEA internal DB but not yet in the public file - they all related to recently created GO terms:

I dont think this what is causing the problem; new RHEAs (not yet public) are allowed. (see https://wiki.geneontology.org/Guidelines_for_database_cross_references#Database_cross-references)

sjm41 commented 2 months ago

I've fixed the "xref: RHEA:80271 {comment="skos:narrowMatch"}" issue in #28015.

For GO:0071164, I've checked the new RHEAs, and they should all be added as narrowMatch xrefs to this term, so I'll do that now.