3dem / model-angelo

Automatic atomic model building program for cryo-EM maps
MIT License
110 stars 18 forks source link

Include undetermined amino acids and handle gaps #99

Open hnguyentt opened 4 months ago

hnguyentt commented 4 months ago

Hello, thanks for your excellent work!

I have some questions about the residues you excluded and included in the protein sequence. I found this code:

def remove_non_residue(sequence: str) -> str:
    return "".join([s for s in sequence if s in "ARNDCQEGHILKMFPSTWYVU"])
  1. You included 21 amino acids. Why did you include Selenocysteine (U) - a rare amino acid but not Pyrrolysine (O), also a rare amino main?
  2. Why didn't you take into account the letter B (Aspartic acid | Asparagine), J (Leucine | Isoleucine), Z (Glutamic acid | Glutamine)? If we remove these letters the sequence length is not corresponding to the real sequence length.
  3. Why didn't you take into account letter X of undetermined residue?
  4. Besides, we have - to denote gap of indeterminate length. Do you have any plan to process this piece of information in the future?

Image from iOS

Thank you for your time!