Closed jamesamcl closed 7 years ago
Actually, this is compliant with the IUPAC protein spec. Closing
...and re-opening. Further investigation reveals the IUPAC protein spec supports "U" for selenocysteine.
It might be worth looking at what BioPython does for this: http://biopython.org/DIST/docs/api/Bio.Alphabet.IUPAC-pysrc.html
"""Extended uppercase IUPAC protein single letter alphabet including X etc.
In addition to the standard 20 single letter protein codes, this includes:
- B = "Asx"; Aspartic acid (R) or Asparagine (N)
- X = "Xxx"; Unknown or 'other' amino acid
- Z = "Glx"; Glutamic acid (E) or Glutamine (Q)
- J = "Xle"; Leucine (L) or Isoleucine (I), used in mass-spec (NMR)
- U = "Sec"; Selenocysteine
- O = "Pyl"; Pyrrolysine
This alphabet is not intended to be used with X for Selenocysteine
(an ad-hoc standard prior to the IUPAC adoption of U instead).
"""
I based it off of this page:
http://www.bioinformatics.org/sms2/iupac.html http://www.bioinformatics.org/sms2/iupac.html
However, you are correct that the specification that we cite does indeed include “U”. I could not find though J and O in our cited document. Could you?
In any case, I will go ahead and update the validator to allow any alpha character for a protein sequence then.
On Nov 23, 2016, at 1:28 PM, James Alastair McLaughlin notifications@github.com wrote:
It might be worth looking at what BioPython does for this: http://biopython.org/DIST/docs/api/Bio.Alphabet.IUPAC-pysrc.html http://biopython.org/DIST/docs/api/Bio.Alphabet.IUPAC-pysrc.html """Extended uppercase IUPAC protein single letter alphabet including X etc.
In addition to the standard 20 single letter protein codes, this includes:
- B = "Asx"; Aspartic acid (R) or Asparagine (N)
- X = "Xxx"; Unknown or 'other' amino acid
- Z = "Glx"; Glutamic acid (E) or Glutamine (Q)
- J = "Xle"; Leucine (L) or Isoleucine (I), used in mass-spec (NMR)
- U = "Sec"; Selenocysteine
O = "Pyl"; Pyrrolysine
This alphabet is not intended to be used with X for Selenocysteine (an ad-hoc standard prior to the IUPAC adoption of U instead). """ — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/libSBOLj/issues/411#issuecomment-262512420, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD967EZIprcphHE1wA37nQ-hNtb-mFks5rBD-ZgaJpZM4K6j76.
Fixed in develop branch.
Reproducable with:
Example protein: http://www.uniprot.org/uniprot/Q9N2J2