PROconsortium / PRoteinOntology

Other
14 stars 3 forks source link

curation/diplay/interpretation problem: aligning modified histones with PRO modified residues. #228

Closed ValWood closed 3 years ago

ValWood commented 3 years ago

It is very confusing that the community are naming histones from +1 and PRO use the natural sequence start.

So, for example https://www.pombase.org/gene/SPCC622.08c

In our protein modification section, we have MOD:00046 | O-phospho-L-serine | modified residue S121 added by bub1 | IDA | Kawashima SA et al. (2010)

BUT the corresponding PRO label used in GO annotation is S122

Screenshot 2021-02-25 at 14 52 30

@nataled has anyone else brought this up? I can see that this will be very confusing in GO annotation. I wanted to see if there is a solution before I use too many modified histones.

@vanaukenk @thomaspd has the issue arisen in GOCAM modelling? We will need these modified forms increasingly to represent 'recruitment' activities.

CC @mah11

ValWood commented 3 years ago

I guess a simple solution for us would be to change the PRO display labels for histones to match the accepted histone modification system, but it seems a big hack.

ValWood commented 3 years ago

H3-K10 will look really odd. But the PRO method is better/correct computationally.

nataled commented 3 years ago

Our convention is to base position designations on the full sequence, not the position after residue removal. This conforms to how it is done in UniProtKB, for example. Doing so prevents other types of confusion (I've seen cases where authors refer to a modified residue position without stating that it is based on post-cleavage numbering, and databases then cite that position as is even though said database uses the same convention as PRO).

By the way, the name you cite (hta1/Phos:122) does not appear in PRO (though I'd like to add that as a synonym in the future). Thus, I don't think there is a fix that can be applied at our end.

This issue has not been raised before, likely because we don't use the name as you've given, and because our term definition makes it clear that we are using full-sequence numbering.

ValWood commented 3 years ago

Actually, I think we can easily solve this because IIRC our display labels can be anything. So we will effectively use a synonym for the display.

Thanks!