Include undetermined amino acids and handle gaps

Hello, thanks for your excellent work!

I have some questions about the residues you excluded and included in the protein sequence. I found this code:

def remove_non_residue(sequence: str) -> str:
    return "".join([s for s in sequence if s in "ARNDCQEGHILKMFPSTWYVU"])

You included 21 amino acids. Why did you include Selenocysteine (U) - a rare amino acid but not Pyrrolysine (O), also a rare amino main?
Why didn't you take into account the letter B (Aspartic acid | Asparagine), J (Leucine | Isoleucine), Z (Glutamic acid | Glutamine)? If we remove these letters the sequence length is not corresponding to the real sequence length.
Why didn't you take into account letter X of undetermined residue?
Besides, we have - to denote gap of indeterminate length. Do you have any plan to process this piece of information in the future?

Image from iOS

Thank you for your time!

3dem / model-angelo