geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
219 stars 40 forks source link

Add check for identical definitions in ontology file? #12815

Closed paolaroncaglia closed 7 years ago

paolaroncaglia commented 7 years ago

Hi @cmungall

Stemming from https://github.com/geneontology/go-ontology/issues/12814 There were 2 terms in the ontology with identical definitions (likely a results of copy-paste):

[Term] id: GO:0043946 name: positive regulation of catalytic activity in other organism involved in symbiotic interaction namespace: biological_process def: "Any process in which an organism stops, prevents, or reduces the frequency, rate or extent of enzyme activity in a second organism, where the two organisms are in a symbiotic interaction." [GOC:mtg_pamgo_17jul06, GOC:tb]

[Term] id: GO:0052199 name: negative regulation of catalytic activity in other organism involved in symbiotic interaction namespace: biological_process def: "Any process in which an organism stops, prevents, or reduces the frequency, rate or extent of enzyme activity in a second organism, where the two organisms are in a symbiotic interaction." [GOC:mtg_pamgo_17jul06, GOC:tb]

I fixed GO:0043946, but I’m a bit surprised that we don’t have a check in place to detect identical definitions. Because if there are any, one of them has to be incorrect (or we need to merge terms). Similar to how we have a check for identical class labels. Thanks!

paolaroncaglia commented 7 years ago

@mcourtot , @ukemi Labeling as editors-discussion as I believe it should be fairly quick to discuss and agree on having such a check - or reporting on whether it exists already but was not working. Thanks.

cmungall commented 7 years ago

Someone has a thanksgiving project...

We should definitely have this check. I can commit this check whenever you like. I'd rather do it sooner. Obv once committed releases will be blocked until all 69 dupes are fixed.

mcourtot commented 7 years ago

Put them in a spreadsheet at https://docs.google.com/spreadsheets/d/12j3dwON67zLWOhMklfeLRMWY14XAAWzdEhvNdXNpNks/edit#gid=0

If we can each do a few each time we edit we should be done quickly. Please indicate the ones that have been fixed in the last column.

paolaroncaglia commented 7 years ago

I made a start (from the bottom). We may want to split the list among us, to avoid duplication of effort... @mcourtot @ukemi @tberardini @dosumis

paolaroncaglia commented 7 years ago

Done a few more. @dosumis if that helps I've noted "DOS" in the last column for rows/terms that are in your ID range, thanks :-)

dosumis commented 7 years ago

Ta

tberardini commented 7 years ago

Ok. Fixed everything that was left EXCEPT for the ones labeled DOS.

paolaroncaglia commented 7 years ago

Added Melanie, Tanya and myself as assignees for stats and retrieval purposes.

dosumis commented 7 years ago

Fixed.

paolaroncaglia commented 7 years ago

Brilliant, thanks @dosumis ! Handing over to @cmungall to implement the check then. As an aside, looking at the duplicates (and at ensuing tickets) supports the view that it's safer to have ontology requests in a tracker than via TermGenie Free Form.

mcourtot commented 7 years ago

There are a few tickets still one - can we wait until they're closed to implement the check? Otherwise release will break.

https://github.com/geneontology/go-ontology/issues/12825 https://github.com/geneontology/go-ontology/issues/12821

cmungall commented 7 years ago

I'll do as soon as #12825 is closed (the other is closed now)

tberardini commented 7 years ago

https://github.com/geneontology/go-ontology/issues/12825 is closed. @cmungall, you can proceed with the check implementation.

tberardini commented 7 years ago

Check implemented. Caught its first duplicate too! I'll close this issue.

tberardini commented 7 years ago

For the record, the dupe caught was:

ERROR: duplicate-defs: GO:0099626 GO:0099635 --> Regulation of presynaptic cytosolic calcium ion concentrations via the directed movement of calcium ions across the plasma-membrane into the cytosol via the action of a voltage-gated calcium ion channel. This is the first step in synaptic transmission