Electrostatics / pdb2pqr

PDB2PQR - determining titration states, adding missing atoms, and assigning charges/radii to biomolecules.
http://www.poissonboltzmann.org/
Other
117 stars 34 forks source link

Problem with DefinitionHandler #293

Closed stefdoerr closed 2 years ago

stefdoerr commented 2 years ago

I have a large XML file with many residues. At some point, adding one more residue breaks the parsing as can be seen here in the case of water H1 atom:

    <atom>
      <name>H1</name>
      <altname>HW</altname>
      <altname>HH1</altname>
      <altname>1H</altname>
      <x>2.865</x>
      <y>56.756</y>
      <z>19.243</z>
      <bond>O</bond>
    </atom>
2022-01-18 11:51:35,867 - pdb2pqr.definitions - INFO - Got text for <name>: H1
2022-01-18 11:51:35,867 - pdb2pqr.definitions - INFO - Got text for <altname>: HW
2022-01-18 11:51:35,867 - pdb2pqr.definitions - INFO - Got text for <altname>: HH1
2022-01-18 11:51:35,867 - pdb2pqr.definitions - INFO - Got text for <altname>: 1H
2022-01-18 11:51:35,867 - pdb2pqr.definitions - INFO - Got text for <x>: 2.865
2022-01-18 11:51:35,867 - pdb2pqr.definitions - INFO - Got text for <y>: 56.
2022-01-18 11:51:35,867 - pdb2pqr.definitions - INFO - Got text for <y>: 756

When that happens the current DefinitionHandler drops the 56. and keeps as y coordinate of the H1 atom 756. And then all hell breaks loose with water atoms flying into space.

Looking at the sax parser docs we can find the following interesting snippet: https://docs.python.org/3.8/library/xml.sax.handler.html#xml.sax.handler.ContentHandler.characters

The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information.

In a similar issue on SO https://stackoverflow.com/a/19793186/1198173 the answer suggests to accumulate the character data and only parse it on the end of the element which makes sense to me.

I'll try to make a fix for this ASAP. Let's delay the release https://github.com/Electrostatics/pdb2pqr/issues/292 until this is resolved