geneontology / pipeline

Declarative pipeline for the Gene Ontology.
https://build.geneontology.org/job/geneontology/job/pipeline/
BSD 3-Clause "New" or "Revised" License
5 stars 5 forks source link

MGI xrefs failing GO checks #408

Open kltm opened 2 days ago

kltm commented 2 days ago

From @ValWood at https://github.com/pombase/pombase-chado/issues/1224

Our MGI ISO xrefs are failing checks.

WARNING - Invalid identifier:GORULE:0000027: 1298204 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPBC530.12c pdf1 enables GO:0008474 PMID:15075260 ISO MGI:MGI:1298204 F palmitoyl protein thioesterase/ dolichol pyrophosphate phosphatase fusion protein Pdf1 protein taxon:4896 20040414 PomBase 
WARNING - Invalid identifier:GORULE:0000027: 1316717 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPBC20F10.03 SPBC20F10.03 is_active_in GO:0005634 GO_REF:0000024 ISS MGI:MGI:1316717 C armadillo-type fold protein, human IFRD1 ortholog, implicated in transcription or signaling protein taxon:4896 20170830 PomBase 
WARNING - Invalid identifier:GORULE:0000027: 1346084 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC6C3.09 rpp40 part_of GO:0005655 GO_REF:0000024 ISS MGI:MGI:1346084 C RNase P and RNase MRP subunit Rpp40 protein taxon:4896 20061017 PomBase 
WARNING - Invalid identifier:GORULE:0000027: 1919005 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC513.06c dhd1 involved_in GO:0042843 GO_REF:0000024 ISS MGI:MGI:1919005 P D-xylose 1-dehydrogenase (NADP+) protein taxon:4896 20150502 PomBase 
WARNING - Invalid identifier:GORULE:0000027: 1919005 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC513.06c dhd1 enables GO:0047837 GO_REF:0000024 ISS MGI:MGI:1919005 F D-xylose 1-dehydrogenase (NADP+) protein taxon:4896 20150502 PomBase 

but its a bit weird because the display and the URL are MGI:1298204 but on the pop-up it says MGI:1298204
could you have a dig and see if the syntax has been resolved to remove the first MGI: or something?

@kltm 's response:

Looking at https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000027.md . Okay, "soft" warning, so no data filtering.

The moment of failure is likely here: https://github.com/biolink/ontobio/blob/master/ontobio/io/assocparser.py#L835 Special casing for MGI leading into it is: https://github.com/biolink/ontobio/blob/master/ontobio/io/assocparser.py#L802-L806

So, it looks like MGI:MGI:1919005 would be clipped to MGI and 1919005, the latter of which would fail when checking against the regexp. The options here would be:

Either way, @pgaudet , this is probably best approached as a GO QC bug for the moment (although a "light" one as no fix or filtering is done) and added to the QC worklist.

kltm commented 2 days ago

@pgaudet I've temporarily put this in the "low-hanging fruit" project in the spec and prioritize section.