Closed sobolevnrm closed 2 years ago
In the first error, is that because the lines in 5mlu.pdb have pairs like:
ATOM 6172 OP1 DT I -71 -53.961 -54.284 76.914 1.00239.30 O
ANISOU 6172 OP1 DT I -71 28185 33880 28859 -5452 1734 6887 O
Do the atom names have to be unique across ATOM
, ANISOU
, and other classes?
Also, can the Residue sequence number be negative?
I guess in general, I am asking: What are the options for fixing this pdb file?
The PDB file isn't broken, our code is. ANISOU is additional information that accompanies the ATOM entry -- PDB2PQR doesn't use it but it looks like it is processing it anyway. I'm not sure why?
The sign on the residue number likely represents the "sense" of the DNA strand. If a file is in the Protein DataBank and PDB2PQR fails to parse it, we can be ~99% sure it is a problem with our code rather than the PDB entry.
Sorry, I asked the question incorrectly. I should have asked: What is the correct action the code should take to handle the two lines where the ATOM and ANISOU have the same atom name and residue sequence number?
Since ATOM and ANISOU are both classes and parsed when the @register_line_parser
decorator in used, there does not seem to be a code path for the ANISOU class instance to find the matching ATOM instance.
Should the ANISOU class inherit from the ATOM class?
Should we simply ignore/skip ANISOU records?
I think we should just ignore/skip the ANISOU records; however, I can't figure out why the code is using them at all. Can you tell where in the code that record is being used (rather than just parsed)? I looked quickly and was unable to find it.
Commenting out the ANISOU record parsing did not change anything. I modified the pdb2pqr/biomolecule.py file line 1001 to output the ATOM record:
_LOGGER.warning(f"Extra atom {atomname} in {residue}! - ({residue.get_atom(atomname)})")
This produced the output showing the ATOM records in question:
WARNING:Extra atom OP1 in DA I -72! - (ATOM 6151 OP1 DA -72 -45.846 -52.479 76.652 0.0000 0.0000)
WARNING:Deleted this atom.
WARNING:Extra atom OP2 in DA I -72! - (ATOM 6152 OP2 DA -72 -46.305 -50.401 75.222 0.0000 0.0000)
WARNING:Deleted this atom.
WARNING:Extra atom OP1 in DA J -72! - (ATOM 9142 OP1 DA -72 -3.967 -0.901 92.640 0.0000 0.0000)
WARNING:Deleted this atom.
WARNING:Extra atom OP2 in DA J -72! - (ATOM 9143 OP2 DA -72 -2.926 -1.671 90.426 0.0000 0.0000)
WARNING:Deleted this atom.
I am no closer to finding the problem but I am hoping this new information may help one of you see something obvious that I am missing.
The new version of PDB2PQR (currently in master, release coming soon) fixes the problem with the nucleic acid.
A user alerted me to this issue via email. PDB2PQR is unable to parse 5MLU despite the fact that this appears to be a high-resolution structure with most atoms in place.
There may be two problems PDB2PQR is encountering. The first (non-fatal) issue seems to be related to incorrect parsing of the
ANISOU
fields in the PDB, resulting in PDB2PQR detecting multiple atoms:The second (fatal) issue is related to missing backbone atoms in the structure:
However, this behavior is expected and the program should fail because of a gap in the backbone documented in the PDB file:
I've documented this so the user can review and let me know if I've captured the problem correctly.