Electrostatics / pdb2pqr

PDB2PQR - determining titration states, adding missing atoms, and assigning charges/radii to biomolecules.
http://www.poissonboltzmann.org/
Other
122 stars 34 forks source link

Fixing rng seed for pdb2pqr tests #249

Closed stefdoerr closed 2 years ago

stefdoerr commented 2 years ago

Hi, I'm having some issues with my tests not passing on CI. The issue seems to be water hydrogens being placed in a non-deterministic manner. If I run them on one machine I get the first image, on another machine the second. The conda env is identical on both machines.

I tried setting both np.random.seed and python random.seed but they don't seem to affect the results. Is there some way to make the water hydrogen placement more deterministic?

Screenshot from 2021-11-15 11-28-12 Screenshot from 2021-11-15 11-28-01

intendo commented 2 years ago

@stefdoerr sorry for the problem and thanks for letting us know about it. Could this be an issue with PyMol (or the visualization software) interpreting the data? Can you isolate the problem by doing a diff on the output PQR files between the two other systems?

sobolevnrm commented 2 years ago

I suspect the non-deterministic behavior comes from https://github.com/Electrostatics/pdb2pqr/blob/master/pdb2pqr/debump.py or related code. I'm unaware of any stochastic functions in these routines but the processes are very nonlinear. Do you have some specific examples we can use to debug?

stefdoerr commented 2 years ago

Yes sure. Here is the input file and the outputs I get on two different machines: 3PTB.zip

This is the command I ran:

pdb2pqr30 /tmp/3PTB.pdb /tmp/3PTB.pqr --pdb-output /tmp/3PTB_computer1.pdb

In this example the change is not as big as in the above example but still you can see water resid 583 and specifically it's H2 hydrogen move in the two structures image

I'm not sure how to suggest to replicate it. Maybe try running the command on a few different machines and do a diff on the output PDB files.

Actually now that I think of it, the results are consistent on a single machine, they only differ between machines. But the conda env is identical. So the issue might be numerical?

sobolevnrm commented 2 years ago

Numerical errors/differences make sense given how nonlinear the method is. I don't see this as a fixable issue -- do you (dis)agree?

stefdoerr commented 2 years ago

I'm just a bit curious as to why it only seems to happen to waters. I have not seen similar issues on hydrogens added on protein residues. It's probably a different code path though. Beyond that I guess I should just remove the water hydrogens in my tests.

sobolevnrm commented 2 years ago

The water hydrogen positions are significantly less constrained than for protein hydrogens. That is likely part/all of the reason.

stefdoerr commented 2 years ago

ok thanks!