Open Tillsten opened 4 years ago
Do you have a proposal which would fix this? Also a specific example, ideally from the PDB, would be very useful - ideally a small one if we were to use it as a test case.
The easy solution would be just to modifiy def _translate_id(self, id)
to just loop through the whole chain if the inital try fails. Not tested modication:
def _translate_id(self, id):
if isinstance(id, int):
trans_id = (" ", id, " ")
if trans_id in self:
return trans_id
else:
for r in self:
if r.id[1] == id:
return r.id
return id
Modified aminoacids are usually defined as HETATM
(e.g. PCA in 5uiw), so the code is doing the right thing: you'd have to write chain[('H', 100, '')]
for residue 1. The chain[100]
shortcut is only there because most residues are canonical amino acids and both the hetatm and insertion code of the id tuple are blank, so it would be super annoying to type chain[('', 100, '')]
.
Further, the proposed solution raises a problem when two residues share the same number.
Do you have an example where a modified amino acid is defined as ATOM
?
My point is that the shortcut works for ATOMs but not for HETATM. Supplying the full id works but is quite unpractical, as it is for normal amino acids. IMO If they have as resseq one should also be able to select them by that. This is also the way all molecular viewers I know work.
I don't what happens when two normal residues have the same ID but copying that behavior for HETATM should not be difficult.
If you still think the current behavior is as intended, feel free to close the issue.
Revisiting this. We could add a shortcut to try and find a HETATM if the ATOM is not found, so something like:
def _translate_id(self, id):
if isinstance(id, int):
# Try ATOM first
trans_id = (" ", id, " ")
if trans_id in self:
return trans_id
# Try HETATM
trans_id = ("H", id, " ")
if trans_id in self:
return trans_id
return id
This is not perfect (you cannot use it to select water molecules for instance) and in case there is a conflict (HETATM and ATOM with the same resid) it defaults to the ATOM, so it might be confusing sometimes. But it would be more practical in many situations. I checked a few PDB files manually and there wasn't overlap in numbering.
Setup
I am reporting a problem with Biopython version, Python version, and operating system as follows:
3.7.5 (default, Oct 31 2019, 15:18:51) [MSC v.1916 64 bit (AMD64)] CPython Windows-10-10.0.14393-SP0 1.75
Expected behaviour
Assuming I have PDB chain with a non-amino acid at resseq=100. If I call
chain[100]
I would like to get that residue. Instead I get aKeyError
, since the internally useddef _translate_id(self, id)
generates the wrong ID.