geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Reactome: possible name errors #149

Closed nataled closed 2 years ago

nataled commented 2 years ago

The following EWASes have names that are inconsistent with others of the same type. Based on modification, all are GPI-anchored proteins:

R-HSA-204088 CEACAM5 (should be GPI-CEACAM5) R-HSA-204085 CEACAM6 (should be GPI-CEACAM6) R-HSA-158946 PI-PLAUR (likely a typo; should be GPI-PLAUR)

deustp01 commented 2 years ago

All fixed.

An issue here is that we want to indicate that an EWAS has been covalently modified by modifying its display name - the name used, e.g., in pathway diagrams and reaction names, and there are two opposing constraints: the full display name must be short, and different modifications should have different names. The second constraint has been implemented as a cap of no more than 3 characters, whenever possible, on the tags used to denote modifications, and there aren't enough of those to make all the distinctions we need and also be easily intelligible to a human user (thus, GPI is good, x27 is not).

Here, we needed to distinguish regular GPI from an acetylated variant in one case, so I cheated on our 3-character cap to name the second acGPI-whatever.

nataled commented 2 years ago

I'm guessing that the EWAS you had to rename was not one of the above, but instead was R-HSA-8940811, is that correct? I ask because as I continue looking at GPIs, I realize that many of the EWAS I'm looking at have modifications using acyl-GPI, but all of these are named GPI-whatever, so in a sense the names are not as precise as they could be. These are:

R-HSA-8940746 R-HSA-8940810 R-HSA-8940794 R-HSA-8940738 R-HSA-8940730
R-HSA-8940801 R-HSA-8940831 R-HSA-8940752 R-HSA-8940815 R-HSA-8940753
R-HSA-8940825 R-HSA-8940814 R-HSA-8940687 R-HSA-8940792 R-HSA-8940714
R-HSA-8940811 R-HSA-8940716 R-HSA-8940717 R-HSA-8940813 R-HSA-8940788
R-HSA-8940702 R-HSA-8940732 R-HSA-8940773 R-HSA-8940729 R-HSA-8940809
R-HSA-8940690 R-HSA-8940816 R-HSA-8940697 R-HSA-8940828 R-HSA-8940767
R-HSA-8940741 R-HSA-8940694 R-HSA-8940817 R-HSA-8940727 R-HSA-8940790
R-HSA-8940709 R-HSA-8940739 R-HSA-8940779 R-HSA-8940823 R-HSA-8940734
R-HSA-8940693 R-HSA-8940703 R-HSA-8940698 R-HSA-8940719 R-HSA-8940797
R-HSA-8940807 R-HSA-8940745 R-HSA-8940758 R-HSA-8940713 R-HSA-8940711
R-HSA-8940689 R-HSA-8940722 R-HSA-8940712 R-HSA-8940786 R-HSA-8940796
R-HSA-8940750 R-HSA-8940728 R-HSA-8940725 R-HSA-8940706 R-HSA-8940793
R-HSA-8940803 R-HSA-8940744 R-HSA-8940704

In all the above cases, the EWAS is a protein whose modification is given as + (that is, CHEBI:63419). I supply this at the moment just to keep in mind, because there is a chance that all of these modifications are incorrect. I would hold off on making changes until I wrap my head around what I'm finding and can describe the issue accurately.

However, the reason I reopened this ticket is because I might have been wrong about renaming PI-PLAUR to GPI-PLAUR (R-HSA-158946). This particular EWAS's GPI modification is specified as N-glycyl-glycosylphosphatidylinositolethanolamine (basically just GPI, but specifically attached to a glycine). There is another EWAS of PLAUR (R-HSA-162687) that has the exact same modification (MOD:00170) as specified by what-I suggested-to-be GPI-PLAUR mentioned in the original comment (R-HSA-158946), but that other EWAS is named 'N-glycyl-glycosylphosphatidylinositolethanolamine'. So, there is:

R-HSA-158946 MOD:00170@305 (was) PI-PLAUR [plasma membrane] R-HSA-162687 MOD:00170@305 N-glycyl-glycosylphosphatidylinositolethanolamine-PLAUR [endoplasmic reticulum membrane]

Thus the MOD is identical but the names differ. I assume they should be the same.

deustp01 commented 2 years ago

the EWAS you renamed was not one of the original three, but instead was R-HSA-8940811, is that correct?

Right.

I would hold off on making changes until I wrap my head around what I'm finding and can describe the issue accurately.

Agreed. Systematically re-naming the EWASs bearing these modifications is straightforward and safe, but it definitely makes sense to do it only once, and to wait until we understand the issue(s) well enough to do it consistently!

deustp01 commented 2 years ago

Thus the MOD is identical but the names differ. I assume they [R-HSA-158946 MOD:00170@305 and R-HSA-162687 MOD:00170@305] should be the same.

Yes, they should be the same. For sanity / simplicity, I have renamed R-HSA-162687 to GPI-PLAUR so naming is consistent (and possibly consistently wrong but also easier to re-rename once the bigger issue is sorted out).

deustp01 commented 2 years ago

wait until we understand the issue(s) well enough to do it [renaming] consistently!

A tangential thought - "Name" is a multivalued attribute so an instance can have arbitrarily many. Only the first name value is displayed on the web page but all values are preserved and accessible to our SOLR search, relieving some of the pressure to come up with unique best names for everything.

ukemi commented 2 years ago

I don't know if this is relevant, but we saw the behavior a while back where in the Reactome Pathways Browser entities had a name and when we did the imports they got a different label. IIRC we determined that it was due to some tag in the biopax. Would it be helpful for me to find examples?

nataled commented 2 years ago

Understood and agreed. The issues I point out will never be about finding 'the best' name for display. Instead, they will be about consistency/standardization.

I believe that the issues in this ticket have been addressed, so will close it.

nataled commented 2 years ago

@ukemi Oops, sorry David! I was composing as you were, it seems. If you wish, you can open another ticket. I'll say only this: If I recall correctly, we've seen the same thing. I believe there's a simple workaround. Unfortunately, that was not work I handled, so I don't recall exactly what that workaround was. I'd have to examine some old code to figure it out.

deustp01 commented 2 years ago

Would it be helpful for me to find examples [of possibly BioPax-induced renaming]?

Yes, definitely and a new ticket sounds right

nataled commented 2 years ago

@ukemi The tag to use is 'displayName'. There is also a 'name' tag (which I treat as synonyms).

ukemi commented 2 years ago

Thanks @nataled! I'll find some examples for @deustp01 and we can open another issue.

deustp01 commented 2 years ago

Remaining irregular proteoforms cleaned up as described in #153 , so this ticket can be really closed

Here are more details for the cleanup of CEACAM5, CEACAM6, PLAUR GitHub_153.docx and here: CEACAMs_PLAUR_fixes.docx

Here are notes for the cleanup of the entities listed in Darren's Jan 25 comment: GPI_removal_reaction.docx