Closed ljarosch closed 3 months ago
I can confirm this bug, thanks for reporting. The problem is not directly the insertion code but the fact the those residues with insertion code have the same residue ID. I will create a fix for this.
For now you can set use_author_fields=False
in get_structure()
to get the expected behavior, as the residue IDs are unique for each residue in a chain in this case.
Hi @padix-key, thanks for the quick fix! Biotite seems like a great package and we look forward to giving it a try within the OpenFold project.
Always happy to help 😃
I tried parsing the .cif file for structure
6evv
in Biotite, but the molecule iterator and bond perception seem to be broken. To reproduce this behavior:Output:
The molecule iterator returns the first residues as individual molecules even though they're all part of the same chain (the same problem occurs for many other not shown residues too). Also checking the bond list on the C atom of the first residue shows no associated peptide bond to the N atom of the next residue:
Maybe a potential reason for this could be the strange insertion code format of this structure which seems to count backwards from H to A(?) This is a relatively serious bug as it would result in parsing way too many disconnected molecules for this entry.
I'm still relatively new to Biotite so would appreciate any advice if there is a better way to infer bonds from PDB entries.