Closed konstin closed 2 years ago
Bug description Inverse folding model crashes when feeding the basic example the 3BTA pdf file.
Reproduction steps
Download 3BTA.pdb
Run
import esm structure = esm.inverse_folding.util.load_structure("3bta.pdb", "A") coords, seq = esm.inverse_folding.util.extract_coords_from_structure(structure) model, alphabet = esm.pretrained.esm_if1_gvp4_t16_142M_UR50() rep = esm.inverse_folding.util.get_encoder_output(model, alphabet, coords)
Expected behavior
It works even with a ZN in the structure so I get some embeddings in rep.
ZN
rep
Logs
Found 1 chains: ['A'] Loaded chain A --------------------------------------------------------------------------- KeyError Traceback (most recent call last) Input In [11], in <cell line: 4>() 1 import esm 3 structure = esm.inverse_folding.util.load_structure("3bta.pdb", "A") ----> 4 coords, seq = esm.inverse_folding.util.extract_coords_from_structure(structure) 5 rep = esm.inverse_folding.util.get_encoder_output(model, alphabet, coords) File /mnt/project/seqvec-search/pp1cb_ss22_structural_embeddings/.venv/lib/python3.8/site-packages/esm/inverse_folding/util.py:69, in extract_coords_from_structure(structure) 67 coords = get_atom_coords_residuewise(["N", "CA", "C"], structure) 68 residue_identities = get_residues(structure)[1] ---> 69 seq = ''.join([ProteinSequence.convert_letter_3to1(r) for r in residue_identities]) 70 return coords, seq File /mnt/project/seqvec-search/pp1cb_ss22_structural_embeddings/.venv/lib/python3.8/site-packages/esm/inverse_folding/util.py:69, in <listcomp>(.0) 67 coords = get_atom_coords_residuewise(["N", "CA", "C"], structure) 68 residue_identities = get_residues(structure)[1] ---> 69 seq = ''.join([ProteinSequence.convert_letter_3to1(r) for r in residue_identities]) 70 return coords, seq File /mnt/project/seqvec-search/pp1cb_ss22_structural_embeddings/.venv/lib/python3.8/site-packages/biotite/sequence/seqtypes.py:512, in ProteinSequence.convert_letter_3to1(symbol) 497 @staticmethod 498 def convert_letter_3to1(symbol): 499 """ 500 Convert a 3-letter to a 1-letter amino acid representation. 501 (...) 510 1-letter amino acid representation. 511 """ --> 512 return ProteinSequence._dict_3to1[symbol.upper()] KeyError: 'ZN'
Additional context
I think the code is choking on the additional zinc ion that's in the structure.
biotite==0.32.0 (latest version) is installed
biotite==0.32.0
This should be resolved now with #205 - please reopen if you still see this issue!
Bug description Inverse folding model crashes when feeding the basic example the 3BTA pdf file.
Reproduction steps
Download 3BTA.pdb
Run
Expected behavior
It works even with a
ZN
in the structure so I get some embeddings inrep
.Logs
Additional context
I think the code is choking on the additional zinc ion that's in the structure.
biotite==0.32.0
(latest version) is installed