OpenBioSim / biosimspace

An interoperable Python framework for biomolecular simulation.
https://biosimspace.openbiosim.org
GNU General Public License v3.0
74 stars 11 forks source link

[BUG] Coordinate not being updated after em #16

Closed xiki-tempula closed 1 year ago

xiki-tempula commented 1 year ago

Describe the bug I think the new caching might be a bit broken in that if you do an energy minimisation, the coordiante is not being updated when you write it down in the same format.

To Reproduce

import BioSimSpace.Sandpit.Exscientia as BSS
mol = pickle.load(open('mol.p', 'rb'))
vac = mol.toSystem()
vac.setBox(3*[4*BSS.Units.Length.nanometer])

protocol = BSS.Protocol.Minimisation()
process = BSS.Process.Amber(vac, protocol, exe=amber_exe, work_dir="em")
process.start()
process.wait()
assert not process.isError()
em = process.getSystem()

process = BSS.Process.Amber(em, protocol, exe=amber_exe, work_dir="equil")

BSS.IO.saveMolecules('em', em, 'rst7')
BSS.IO.saveMolecules('em', em, 'gro87')

Expected behavior I would imagine that the coordinate after em equil/amber.rst7 will be different from the coordinate before the em em/amber.rst7. The same system writing down as amber RST7 em.rst7 will be the same as the Gro file em.gro

But the equil/amber.rst7

BioSimSpace_System
   12
  20.0424958  20.0210173  18.6057628  21.0783599  19.4264583  19.3263467
  21.0356408  19.4053314  20.7203907  19.9570549  19.9786839  21.3938791
  18.9211843  20.5731855  20.6732578  18.9639111  20.5943132  19.2792095
  20.0747780  20.0370579  17.5204133  21.9186800  18.9804402  18.8023066
  21.8422027  18.9423722  21.2813245  19.9238789  19.9618695  22.4795867
  18.0813200  21.0196591  21.1972860  18.1577341  21.0576278  18.7178728
  40.0000000  40.0000000  40.0000000  90.0000000  90.0000000  90.0000000

Is the same as the coordinate before em.

BioSimSpace_System
   12
  20.0424958  20.0210173  18.6057628  21.0783599  19.4264583  19.3263467
  21.0356408  19.4053314  20.7203907  19.9570549  19.9786839  21.3938791
  18.9211843  20.5731855  20.6732578  18.9639111  20.5943132  19.2792095
  20.0747780  20.0370579  17.5204133  21.9186800  18.9804402  18.8023066
  21.8422027  18.9423722  21.2813245  19.9238789  19.9618695  22.4795867
  18.0813200  21.0196591  21.1972860  18.1577341  21.0576278  18.7178728
  40.0000000  40.0000000  40.0000000  90.0000000  90.0000000  90.0000000

And is the same as the em.rst7

BioSimSpace_System
   12
  20.0424958  20.0210173  18.6057628  21.0783599  19.4264583  19.3263467
  21.0356408  19.4053314  20.7203907  19.9570549  19.9786839  21.3938791
  18.9211843  20.5731855  20.6732578  18.9639111  20.5943132  19.2792095
  20.0747780  20.0370579  17.5204133  21.9186800  18.9804402  18.8023066
  21.8422027  18.9423722  21.2813245  19.9238789  19.9618695  22.4795867
  18.0813200  21.0196591  21.1972860  18.1577341  21.0576278  18.7178728
  40.0000000  40.0000000  40.0000000  90.0000000  90.0000000  90.0000000

But this is different from em.gro

BioSimSpace_System
   12
    1LIG    C1x    1   1.984   2.012   1.899
    1LIG    C2x    2   2.099   1.947   1.934
    1LIG    C3x    3   2.104   1.940   2.074
    1LIG    C4x    4   1.992   2.000   2.130
    1LIG    C5x    5   1.918   2.044   2.024
    1LIG    C6x    6   1.946   2.031   1.962
    1LIG    H1x    7   1.949   2.035   1.799
    1LIG    H2x    8   2.173   1.908   1.865
    1LIG    H3x    9   2.179   1.896   2.127
    1LIG    H4x   10   1.968   2.010   2.236
    1LIG    H5x   11   1.824   2.096   2.037
    1LIG    H6x   12   1.861   2.080   1.915
   4.00000   4.00000   4.00000

Which would be the correct energy minimised structure.

Input files mol.p.zip

(please complete the following information):

Additional context Add any other context about the problem here.

lohedges commented 1 year ago

Thanks for this, I'll see what's gone wrong. It should be checking the property version when comparing the cache, which presumably should have changed for the coordinates.

lohedges commented 1 year ago

Very strange, the isSame method is stating that em and the original vac system are the same, even though their coordinates differ. Will see what's gone wrong.

print(vac.isSame(em))

c0 = vac[0]._sire_object.property("coordinates")
c1 = em[0]._sire_object.property("coordinates")
print(c0, c1, c0==c1)
True
AtomCoords( size=12
0: ( 20.0425 Å, 20.021 Å, 18.6058 Å )
1: ( 21.0784 Å, 19.4265 Å, 19.3263 Å )
2: ( 21.0356 Å, 19.4053 Å, 20.7204 Å )
3: ( 19.9571 Å, 19.9787 Å, 21.3939 Å )
4: ( 18.9212 Å, 20.5732 Å, 20.6733 Å )
...
7: ( 21.9187 Å, 18.9804 Å, 18.8023 Å )
8: ( 21.8422 Å, 18.9424 Å, 21.2813 Å )
9: ( 19.9239 Å, 19.9619 Å, 22.4796 Å )
10: ( 18.0813 Å, 21.0197 Å, 21.1973 Å )
11: ( 18.1577 Å, 21.0576 Å, 18.7179 Å )
) AtomCoords( size=12
0: ( 19.8388 Å, 20.121 Å, 18.991 Å )
1: ( 20.9963 Å, 19.4713 Å, 19.3411 Å )
2: ( 21.0444 Å, 19.4 Å, 20.7361 Å )
3: ( 19.9218 Å, 20.0011 Å, 21.2999 Å )
4: ( 19.1853 Å, 20.4414 Å, 20.2389 Å )
...
7: ( 21.7403 Å, 19.0832 Å, 18.6506 Å )
8: ( 21.7989 Å, 18.9668 Å, 21.2687 Å )
9: ( 19.6797 Å, 20.1008 Å, 22.3547 Å )
10: ( 18.2391 Å, 20.959 Å, 20.3712 Å )
11: ( 18.608 Å, 20.7949 Å, 19.143 Å )
)
False

Will figure out why the comparison is failing in this case, since it's working in the unit tests.

lohedges commented 1 year ago

Something really weird is going on with the properties of the pickled molecule. If I look at the properties that are compared in the isSame function I see the following. This is system properties, followed by all molecule level properties:

import BioSImSpace.Sandpit.Exscientia as BSS
mol = pickle.load(open('mol.p', 'rb'))
vac = mol.toSystem()

vac.isSame(vac)
['space']
['space']
[]
[]
True

The system properties are present, but the molecular ones aren't. Looking at the molecule directly, they do exist:

vac[0]._sire_object.propertyKeys()
Out[17]:
['LJ',
 'gb_screening',
 'atomtype',
 'forcefield',
 'was_perturbable',
 'bond',
 'name',
 'angle',
 'treechain',
 'molecule',
 'gb_radii',
 'connectivity',
 'improper',
 'intrascale',
 'element',
 'ambertype',
 'dihedral',
 'charge',
 'coordinates',
 'mass']

I see the same thing if I save the vac system to file and re-load. But, if I try another system from file:

s = BSS.IO.readMolecules("amber/ala/*")[0].toSystem()
s.isSame(s)
['fileformat']
['fileformat']
['gb_screening', 'LJ', 'atomtype', 'forcefield', 'parameters', 'bond', 'angle', 'treechain', 'gb_radii', 'improper', 'connectivity', 'intrascale', 'element', 'ambertype', 'space', 'gb_radius_set', 'dihedral', 'fileformat', 'charge', 'coordinates', 'mass']
['gb_screening', 'LJ', 'atomtype', 'forcefield', 'parameters', 'bond', 'angle', 'treechain', 'gb_radii', 'improper', 'connectivity', 'intrascale', 'element', 'ambertype', 'space', 'gb_radius_set', 'dihedral', 'fileformat', 'charge', 'coordinates', 'mass']
True

The logic is the same in both cases, so I'm not sure why the properties are disappearing.

My initial assumption was that you had a pure solvent system, since by default isSame skips comparing water molecules for speed.

Will keep debugging.

lohedges commented 1 year ago

I've found the issue. For some reason, when comparing properties molecule-by-molecule, they are being extracted as Residue objects, not Molecule, so they don't have any properties. I'm not yet sure why, since we're calling getMolecules on a system, which should return Molecules.

lohedges commented 1 year ago

Okay, it's because the search logic is broken for some reason:

print(vac.search("not water").molecules()[0])
<BioSimSpace.Residue: name='LIG', molecule=4363, index=0, nAtoms=12>

This should return a molecule, since I've explicitly called .molecules() on the search result.