Our MGI ISO xrefs are failing checks.
WARNING - Invalid identifier:GORULE:0000027: 1298204 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPBC530.12c pdf1 enables GO:0008474 PMID:15075260 ISO MGI:MGI:1298204 F palmitoyl protein thioesterase/ dolichol pyrophosphate phosphatase fusion protein Pdf1 protein taxon:4896 20040414 PomBase
WARNING - Invalid identifier:GORULE:0000027: 1316717 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPBC20F10.03 SPBC20F10.03 is_active_in GO:0005634 GO_REF:0000024 ISS MGI:MGI:1316717 C armadillo-type fold protein, human IFRD1 ortholog, implicated in transcription or signaling protein taxon:4896 20170830 PomBase
WARNING - Invalid identifier:GORULE:0000027: 1346084 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC6C3.09 rpp40 part_of GO:0005655 GO_REF:0000024 ISS MGI:MGI:1346084 C RNase P and RNase MRP subunit Rpp40 protein taxon:4896 20061017 PomBase
WARNING - Invalid identifier:GORULE:0000027: 1919005 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC513.06c dhd1 involved_in GO:0042843 GO_REF:0000024 ISS MGI:MGI:1919005 P D-xylose 1-dehydrogenase (NADP+) protein taxon:4896 20150502 PomBase
WARNING - Invalid identifier:GORULE:0000027: 1919005 does not match any id_syntax patterns for MGI in dbxrefs--PomBase SPAC513.06c dhd1 enables GO:0047837 GO_REF:0000024 ISS MGI:MGI:1919005 F D-xylose 1-dehydrogenase (NADP+) protein taxon:4896 20150502 PomBase
but its a bit weird because the display and the URL are MGI:1298204 but on the pop-up it says MGI:1298204
could you have a dig and see if the syntax has been resolved to remove the first MGI: or something?
So, it looks like MGI:MGI:1919005 would be clipped to MGI and 1919005, the latter of which would fail when checking against the regexp. The options here would be:
change the dbxrefs regexp to reflect our behind-the-scenes fix of MGI (I'm not sure what the knock-on effect would be)
remove the ontobio "fix" (I'm not sure what the knock-on effect would be)
change the MGI full id to MGI:MGI:MGI:1919005 (I know what the knock-on effect would be: hilarity)
Either way, @pgaudet , this is probably best approached as a GO QC bug for the moment (although a "light" one as no fix or filtering is done) and added to the QC worklist.
From @ValWood at https://github.com/pombase/pombase-chado/issues/1224
@kltm 's response:
Looking at https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000027.md . Okay, "soft" warning, so no data filtering.
The moment of failure is likely here: https://github.com/biolink/ontobio/blob/master/ontobio/io/assocparser.py#L835 Special casing for MGI leading into it is: https://github.com/biolink/ontobio/blob/master/ontobio/io/assocparser.py#L802-L806
So, it looks like
MGI:MGI:1919005
would be clipped toMGI
and1919005
, the latter of which would fail when checking against the regexp. The options here would be:MGI:MGI:MGI:1919005
(I know what the knock-on effect would be: hilarity)Either way, @pgaudet , this is probably best approached as a GO QC bug for the moment (although a "light" one as no fix or filtering is done) and added to the QC worklist.