PROconsortium / PRoteinOntology

Other
14 stars 3 forks source link

MSA for terms that are unmodified at a specific position - how to display #256

Open nataled opened 3 years ago

nataled commented 3 years ago

From @karenross: For this term: http://purl.obolibrary.org/obo/PR_Q99MZ3 there are three "unhydroxylated" forms that appear in the MSA with the unhydroxylated residue colored exactly the same as for hydroxylated residues. For this term, http://purl.obolibrary.org/obo/PR_P35222, the unphosphorylated forms just don't show up in the MSA.

From @Julie-Cowart: Yes you are right about the color issue. If you click the word modification in the upper left you can see the legend. If you mouse over you will see that the modification is different (MOD:00039 for hydroxylated and PR:000026291 for unmodified amino-acid residue).

For the unphosphorylated forms issue, they are being explicitly removed in code with a note that says (# remove unmod forms (they don't have dbxref)). I don't really get the comment since it may have been true at one point but is not necessarily true now. In any case we are suppressing the unmodified forms from the msa and have been for a long time. Back to the first issue involving unhydroxylated forms since the fact that they are not suppressed is actually a bug. The code that checks for unmod forms is incorrect in the case of unhydroxylated since it looks for one of ["UnMod", "PhosRes-", "unmodified", "unphospho", "UnPhos"] in the name which is an incomplete list obviously since it is all ways of saying unphosphorylated. The list either needs to be expanded to included other kinds of unmodified forms or explicitly check for PR:000026291 for unmodified amino-acid residue.

nataled commented 3 years ago

From PIR-PRO discussion: Ultimately we do want the specifically-unmodified residues to be displayed in a consistent way. It is not enough to keep them 'blank' (so to speak) as we would for a sequence with no indications of modifications (such as would be the case for the base sequence) because it is usually the case that there are multiple possible positions that can have the statement of non-modification, and it would not be possible to tell which residue is the appropriate one.

We also decided that we won't worry about distinguishing unphosphorylated from, say, unacetylated.

Julie-Cowart commented 3 years ago

The code for explicitly handling unmodified does exist and according to the style it makes the base white background with grey border. We could argue another color is better (perhaps grey but grey already means conserved). The current logic seems to be buggy and only checks unphosphorylated and uses the term name to detect so needs some improvement once we decide how it should work. Separately, as already stated, it also seems to use the term name to suppress the entry in the msa hence the missing rows Karen noticed.