biolink / ontobio

python library for working with ontologies and ontology associations
https://ontobio.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
118 stars 30 forks source link

Go site 2210 gorule 0000027 must check dbs are in the db xref file #677

Open mugitty opened 1 week ago

dustine32 commented 1 week ago

Linking to geneontology/go-site#2210

dustine32 commented 1 week ago

@mugitty Do we already have a test for gorule 0000027 somewhere?

mugitty commented 1 week ago

@dustine32 , I pushed updates to combine id syntax change into _validate_id method

dustine32 commented 1 week ago

@mugitty Awesome, thank you so much! Do we have a test for wrong IDs that should return that "does not match any id_syntax patterns" warning message? It would be good to confirm your new code can be triggered by this.

mugitty commented 1 week ago

I ran it through https://github.com/geneontology/go-site/blob/master/docs/gorules_test_errors.gaf . The output error.json file contains things such as ... { "level": "WARNING", "line": "MGI:1100518\tSmad7\tbla\tinvolved_in\tGO:0017015\tMGI:MGI:3836072|PMID:18952608\tIC\tGO:0060389\tP\tGORULE_TEST:0000020-3\tSMAD\tprotein_coding_gene\ttaxon:10090\t20090211\tGO_Central\t\n", "type": "Invalid identifier", "message": "GORULE:0000027: 3836072 does not match any id_syntax patterns for MGI in dbxrefs", "obj": "MGI:MGI:3836072", "taxon": "NCBITaxon:10090", "rule": 27 }, { "level": "WARNING", "line": "UniPotKB\tQ9HC96\tCAPN10\tinvolved_in\tGO:0006921\tPMID:23072806\tIDA\t\tP\tGORULE_TEST:0000027-1 Calpain-10\tCAPN10,KIAA1845\tprotein\ttaxon:9606\t20140213\tGO_Central\t\n", "type": "Invalid identifier", "message": "GORULE:0000027: UniPotKB not found in list of database names in dbxrefs", "obj": "UniPotKB:Q9HC96", "taxon": "NCBITaxon:9606", "rule": 27 }, { "level": "WARNING", "line": "UniProtKB\tQ9HC96\tCAPN10\tinvolved_in\tGO:0006921\tPMID:PMID:14561399\tIDA\t\tP\tGORULE_TEST:0000027-3 Calpain-10\tCAPN10,KIAA1845\tprotein\ttaxon:9606\t20140213\tGO_Central\t\n", "type": "Invalid identifier", "message": "GORULE:0000027: PMID:14561399 does not match any id_syntax patterns for PMID in dbxrefs", "obj": "PMID:PMID:14561399", "taxon": "NCBITaxon:9606", "rule": 27 }, ...

This update outputs warnings. Groups can update the id_syntax and or ids based on these errors

dustine32 commented 1 week ago

@mugitty Thanks again! It's great that you confirmed the go-site checks will run correctly with this change but usually we also test this type of functionality within the ontobio/tests. As a comparable example, I'm looking at the test for GoRule43 and it recreates a small part of the gorefs metadata to test the GO_REF validation functionality. Could we do the same thing in test_qc.py (i.e., add a test_gorule27 function) with a small sample of the db-xrefs metadata?

mugitty commented 1 week ago

@dustine32, since these are validation tests, I added tests to both test_gafparser.py and test_gpad_parser.py