geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Reactome - duplicate EWAS? #280

Closed nataled closed 7 months ago

nataled commented 9 months ago

The attached file contains a list of EWAS that, so far as I can tell, are duplicate entities. Sometimes the names are different, and of course the identifier, but the details of the proteoform (UniProtKB entry it's based on, the sequence range, the cellular component it's found it, and the modifications) are identical.

This won't affect PRO at all since PRO will simply map all to the same entity, but it's possible these were not intended to be the same.

reactome_EWAS_preprocess.dupe.txt

deustp01 commented 9 months ago

Hi Darren, I will take a look, round up the usual suspects to sort this out, and post progress to the GitHub ticket to keep things orderly. (Same for the weird chain lengths – ticket #279 – once we’ve figured out what’s going on, the diagnosis and cleanup progress will go on the ticket.) Peter

deustp01 commented 7 months ago

All items on the list have now been fixed as shown on the second sheet of the Excel file attached here. All but the last pair of EWASs were in fact duplicate instances and one of each pair has been removed with a duplcate-instance annotation to enable tracking. For the last pair, the two instances were intended to represent variants of a protein that had been mutated to now have two different amino acid residues at the same position, as shown in their names, but were mistakenly annotated with the same substitution. That mistake is now fixed, all edits have been submitted to the Reactome internal database, and should become publicly visible with the next Reactome release in December 2021, so I'm closing this ticket. reactome_EWAS_preprocess.dupe.xlsx