google-deepmind / alphafold

Open source code for AlphaFold 2.
Apache License 2.0
12.75k stars 2.26k forks source link

Does alphafold support proteins with non-canonical aminoacids such as selenocysteine? #531

Open abhinavb22 opened 2 years ago

abhinavb22 commented 2 years ago

I tried to predict the structure of a protein that contains a selenocysteine. The unrelaxed pdb has a missing residue corresponding to the selenocysteine position and relaxation failed due to missing atoms. Is this a common issue for proteins with non-canonical aminoacids or am I doing something wrong here?

katemichie commented 2 years ago

Hi, I've had this happen many times. Seems to be at the Amber stage and comes from template matching- it always explodes if there is a Se Met in one of the models. My runs just stop. Clarifying that my input sequences don't have any non-canonical amino acids. It's really frustrating and I haven't found a solution.

Log states: "I0809 13:04:54.623805 140627017672512 run_docker.py:255] I0809 13:04:54.623058 140420112467776 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {<Residue 612 (LEU) of chain 0>: ['OXT'], <Residue 1225 (LEU) of chain 1>: ['OXT'], <Residue 1838 (LEU) of chain 2>: ['OXT'], <Residue 2451 (LEU) of chain 3>: ['OXT']}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0809 13:04:59.123191 140627017672512 run_docker.py:255] I0809 13:04:59.122625 140420112467776 amber_minimize.py:408] Minimizing protein, attempt 1 of 100. I0809 13:05:03.980844 140627017672512 run_docker.py:255] I0809 13:05:03.980273 140420112467776 amber_minimize.py:69] Restraining 18960 / 37808 particles. I0809 13:10:01.181899 140627017672512 run_docker.py:255] I0809 13:10:01.180551 140420112467776 amber_minimize.py:177] alterations info: {'nonstandard_residues': [], 'removed_heterogens': set(), 'missing_residues': {}, 'missing_heavy_atoms': {}, 'missing_terminals': {}, 'Se_in_MET': [], 'removed_chains': {0: []}} I0809 13:10:35.730615 140627017672512 run_docker.py:255] /app/run_alphafold.sh: line 3: 8 Killed python /app/alphafold/run_alphafold.py "$@"

ziqiaos2 commented 2 years ago

I have te same issue. In this case, will it be accurate for predicting the structure by replacing the Selenocysteine by cysteine? Did aflphafold sequence input recognize the U amino acids as selenocysteine? I did not have error when I input the sequence with U.

abhinavb22 commented 2 years ago

I have te same issue. In this case, will it be accurate for predicting the structure by replacing the Selenocysteine by cysteine? Did aflphafold sequence input recognize the U amino acids as selenocysteine? I did not have error when I input the sequence with U.

I think it should be okay to predict the structure by replacing selenocysteines by cysteines as it should not change the 3-D prediction. And, I don't think there is a one-hot representation that covers selenocysteines. 20 usual aminoacids are represented from 0-19 and anything other than that is represented as 20 (unknown). It works for the unrelaxed step and the models will have a missing amino acid at the position corresponding to Selenocysteine but during relaxation it crashes.

katemichie commented 2 years ago

Is anyone else having problems with Alphafold stopping after the first model is output because there seems to be a selenomethionine in a TEMPLATE (not my query)? I'm coming to the idea that it's a problem with Amber but not completely sure. It's happening a lot. Any time a run stops mid way, it's always at the end of the first model. The unrelaxed model is output and the machine shuts down. The log file always has Se-met and non standard animo acids in it, but the query I submitted was native amino acids. I'm concluding the heterogens are coming from the template searching, it seems they get removed and then maybe there is a mismatch in the number of atoms? Maybe I'm barking up the wrong tree.... any pointers would be helpful as it's happening in about 20% of my runs. I'm really not a coder so I'm muddling around here.