I have some questions about the residues you excluded and included in the protein sequence. I found this code:
def remove_non_residue(sequence: str) -> str:
return "".join([s for s in sequence if s in "ARNDCQEGHILKMFPSTWYVU"])
You included 21 amino acids. Why did you include Selenocysteine (U) - a rare amino acid but not Pyrrolysine (O), also a rare amino main?
Why didn't you take into account the letter B (Aspartic acid | Asparagine), J (Leucine | Isoleucine), Z (Glutamic acid | Glutamine)? If we remove these letters the sequence length is not corresponding to the real sequence length.
Why didn't you take into account letter X of undetermined residue?
Besides, we have - to denote gap of indeterminate length. Do you have any plan to process this piece of information in the future?
Hello, thanks for your excellent work!
I have some questions about the residues you excluded and included in the protein sequence. I found this code:
B
(Aspartic acid | Asparagine),J
(Leucine | Isoleucine),Z
(Glutamic acid | Glutamine)? If we remove these letters the sequence length is not corresponding to the real sequence length.X
of undetermined residue?-
to denote gap of indeterminate length. Do you have any plan to process this piece of information in the future?Thank you for your time!