OpenBioSim / sire

Sire Molecular Simulations Framework
https://sire.openbiosim.org
GNU General Public License v3.0
42 stars 11 forks source link

feature_somd_freenrg_positional_restraints #59

Closed jmichel80 closed 1 year ago

jmichel80 commented 1 year ago

This feature branch contains code that generalised the positional restraints and constraints functionality available in somd-freenrg.

Is your feature request related to a problem? Please describe.

The current code offers limited options to control which atoms in a system are subject to positional restraints. Currently it is possible to appy positional restraints to every heavy atoms present a system, excluding atoms contained in a list of excluded residue names, as described here.

It is only possible to freeze atoms that are contained in a list of allowed residue names, as described here

Describe the solution you'd like

It should be possible to restrain or freeze some or all atoms present in a list of residues. Such list of atoms would be contained in a csv file autogenerated by a preprocessing script run on the prm7/rst7 files used as input to somd-freenrg. The format would look like

#ResidueNumber, #AtomNumber
1,all
2,45-54
8,all
10,185-199

The residue specific positional restraints would be activated by specifying a config file keyword option

positional restraints = path_to_restraint_file

lohedges commented 1 year ago

Thanks, @jmichel80. Just a few thoughts / comments...

The format would look like

ResidueNumber, #AtomNumber

1,all 2,45-54 8,all 10,185-199

Perhaps this is convenient for the intended users interfacing with SOMD from the command-line, but I was wondering if there is a reason why the restraints couldn't be provided as a list of atom indices? I only ask because this is how the current position restraint system in BioSimSpace works, and is how we currently interface with every other engine. It is obviously possible for us to convert atom indices to the format that you propose, but it is a bit more work. Using atom indices would also allow the restraints to be easily entered in the configuration file directly, e.g. as a comma-separated list of indices. That way the configuration option could be an explicit list of restraints, or a path to a restraint file. When just using atom indices you also don't need to worry about parsing both strings and integers, and index regions in different formats, e.g. 45-54, 46,47,49-52,56, etc. (Although we could still support compact range formatting.) I guess we could support both approaches (atom indices and residue number plus atom number), or provide a utility script/function within BioSimSpace to convert formats. (There is already code in BioSimSpace that would easily convert your suggested format to absolute atom indices, but not the reverse.)

What kind of simulations are you planning on using position restraints with? Is this strictly for SOMD FEP in the NVT ensemble? Having worked with OpenMM within BioSimSpace, I found that I needed to apply a non-standard position restraint method to cope with NPT simulations. This uses zero-mass dummy atoms that are bonded to the atoms that we want to restrain. This method makes sure that things don't blow up when the box coordinates are rescaled during NPT moves. If this is useful, then I can point to code in BioSimSpace that shows how to set this up. (Perhaps the approach in SOMD is very different, or for more specific use cases.)

Cheers.

chryswoods commented 1 year ago

Another possibility could be allowing sire search strings to be used. You could then easily have things like "protein and not (residues within 5 angstrom of ligand)" or "atomname CA" etc.

See here for a full description of the sire search syntax.

jmichel80 commented 1 year ago

Good points. It is simpler and more flexible to just enter atom numbers. We used positional restraints in SOMD NPT simulations for Nautilus analyses ages ago, I can see this could be problematic if the initial box density is off. It may be possible to switch to NVT for this use case if problematic. Could you show me where the BioSimSpace OpenMM zero mass dumy atoms code is ?

It may be easier to integrate this feature update in Cresset's stack by assuming a list of restraints has already been generated by the time somd is invoked, but we will write anyway a sire script using the search syntax to generate those restraints for debugging purposes.

lohedges commented 1 year ago

The relevant section of the code is here. This is very easy to do and is essentially what was recommended by the developers in this issue thread. The only other consideration is that the dummy atom positions will also be written to the DCD trajectory file. This isn't a problem in general, but might not be what you want. For example, in BioSImSpace we are strict about output being consistent with the input (I need to copy the coordinates/velocities back into the original system) so I just monkey-patched the DCD.writeModel method so that the additional dummy atoms are excluded. Code for this is here.

jmichel80 commented 1 year ago

I had a go at loading the provided Cresset's input in somd-freenrg and ran into what sounds like the issue reported here originally https://github.com/michellab/Sire/issues/345

Using sire compiled on the latest devel branch

(sireDEV) matthew@yoko:~/jm/test-posrestraints$ somd-freenrg --version
Starting somd-freenrg: number of threads equals 20
OPENMM_PLUGIN_DIR = /home/matthew/mambaforge/envs/sireDEV/lib/plugins

somd-freenrg -- from Sire release version <2023.3.0.dev>
This particular release can be downloaded here: https://github.com/openbiosim/sire/releases/tag/v2023.3.0.dev

and with the attached inputs

input.zip

the command

somd-freenrg -C somd.cfg -t 1a~1b.prm7 -c 1a~1b.rst7 -m 1a~1b.pert -p CUDA -l 0.0

gives

Traceback (most recent call last):
  File "/home/matthew/mambaforge/envs/sireDEV/share/Sire/scripts/somd-freenrg.py", line 161, in <module>
    OpenMMMD.runFreeNrg(params)
  File "/home/matthew/mambaforge/envs/sireDEV/lib/python3.10/site-packages/sire/legacy/Tools/__init__.py", line 175, in inner
    retval = func()
  File "/home/matthew/mambaforge/envs/sireDEV/lib/python3.10/site-packages/sire/legacy/Tools/OpenMMMD.py", line 2683, in runFreeNrg
    print(f'Initial energy: {integrator.getPotentialEnergy(system)}')
UserWarning: Periodic box vectors must be in reduced form.

Changing the box shape to cubic

(sireDEV) matthew@yoko:~/jm/test-posrestraints$ tail -1 1a~1b.rst7
  64.1150000  64.1150000  64.1150000  70.5287794 109.4712206  70.5287794
(sireDEV) matthew@yoko:~/jm/test-posrestraints$ tail -1 1a~1b.rst7.reduced 
  64.1150000  64.1150000  64.1150000  90.0000000  90.0000000  90.0000000

allows the run to complete without runtime error.

Interestingly, using a slightly older sire release

(abfe2) julien@yoko:~/restraints/test-posrestraints$ somd-freenrg --version
Starting somd-freenrg: number of threads equals 20

somd-freenrg -- from Sire release version <2022.3.0>
This particular release can be downloaded here: https://github.com/michellab/Sire/releases/tag/v2022.3.0

The same command

somd-freenrg -C somd.cfg -t 1a~1b.prm7 -c 1a~1b.rst7 -m 1a~1b.pert -p CUDA -l 0.0

Completes the run without raising an exception.

@lohedges can you think of a code change in sire that would cause this difference in behaviour?

I note that the Sire 2022.3.0 version was linked with

/home/julien/miniforge3/envs/abfe2/lib/libOpenMM.so.7.7*

and the Sire 2023.3.0.dev version was linked with

/home/matthew/mambaforge/envs/sireDEV/lib/libOpenMM.so.8.0*

lohedges commented 1 year ago

I'm guessing this is a result of the triclinic rounding fix in #49. I did test against the input from the old Michellab issue thread that you link to and didn't have issues. I think Cresset use LEaP to solvate, rather than gmx solvate. I did generate some LEaP triclinic boxes manually too and could load and run those without issue.

I'll look at the input you provide next week to see if I can figure out what's going on. It might be worth getting someone to prepare the input using the latest Sire (if that wasn't the case) since it's possible that the new rounding scheme wasn't applied when creating the input files. If everything is consistent the whole way through then it is hopefully okay.

lohedges commented 1 year ago

If it is the triclinic change, then I think it might be best to revert and provide a way for people to set the same space for all systems in an AMBER repex stimulation, since that's the only case where this is a problem. (AMBER can read the files, but doesn't realise that the space is equivalent, presumably just checking all angles are identical.) I have already posted a simple workaround for this which works.

lohedges commented 1 year ago

I'm also confused by the 2023.3.0 version, since it's not released yet! I wonder if something had gone wrong with the versioning for one of the patch releases?

chryswoods commented 1 year ago

Julien and Matthew are working on the latest dev version, which is 2023.3.0

lohedges commented 1 year ago

Ah, okay. I just see 2023.3.0 and 2023.3.0.dev mentioned in the version output.

lohedges commented 1 year ago

I've just checked my development branch and the latest conda package with the dev tag and for both I see:

In [1]: import sire as sr

In [2]: sr.__version__
Out[2]: '2023.3.0.dev'

and

Starting somd-freenrg: number of threads equals 20
OPENMM_PLUGIN_DIR = /home/lester/.conda/envs/test2/lib/plugins

somd-freenrg -- from Sire release version <2023.3.0.dev>
This particular release can be downloaded here: https://github.com/openbiosim/sire/releases/tag/v2023.3.0.dev

I'm not sure how it's possible that Julien is seeing:

(abfe2) julien@yoko:~/restraints/test-posrestraints$ somd-freenrg --version
Starting somd-freenrg: number of threads equals 20

somd-freenrg -- from Sire release version <2022.3.0>
This particular release can be downloaded here: https://github.com/michellab/Sire/releases/tag/v2022.3.0

Unless he's working on a branch where the version.txt file has been updated to 2023.3.0. For me it's:

cat version.txt
2023.3.0.dev

Checking the TriclinicBox for the input files. For Sire 2023.2.3 on main, or the latest development version, i.e. with the rounding "fix", I see:

In [1]: import sire as sr

In [2]: s = sr.load("1a~1b.*7")

In [3]: s.property("space")
Out[3]: TriclinicBox( ( 64.115, 0, 0 ), ( 21.3717, 60.4482, 0 ), ( 21.3717, -30.2241, 52.3497 ) )

Before this I see:

In [1]: import sire as sr

In [2]: s = sr.load("1a~1b.*7")

In [3]: s.property("space")
Out[3]: TriclinicBox( ( 64.115, 0, 0 ), ( 21.3717, 60.4482, 0 ), ( -21.3717, 30.2241, 52.3497 ) )

The only thing that is different is that the x and y components of the last box vector have flipped sign. Without the rounding fix OpenMM is happy and the code runs, with it if fails with the error that @jmichel80 is seeing, i.e.:

Traceback (most recent call last):
  File "/home/lester/.conda/envs/openbiosim/share/Sire/scripts/somd-freenrg.py", line 161, in <module>
    OpenMMMD.runFreeNrg(params)
  File "/home/lester/.conda/envs/openbiosim/lib/python3.10/site-packages/sire/legacy/Tools/__init__.py", line 175, in inner
    retval = func()
  File "/home/lester/.conda/envs/openbiosim/lib/python3.10/site-packages/sire/legacy/Tools/OpenMMMD.py", line 2683, in runFreeNrg
    print(f'Initial energy: {integrator.getPotentialEnergy(system)}')

Looking at the actual OpenMM code where the lattice reduction is checked, i.e. here and here, the conditional that is failing is:

b[1] < 2*fabs(c[1]))

i.e.

60.448201713899216 < 60.448201868282005

This is an errorr of roungly 1.54e-07, which is comparable to what was causing the issue in https://github.com/michellab/Sire/issues/345.

EDIT

(Oh, and I guess it's not the sign of the terms of in the last box vector that are the issue, since there's an abs in the conditional. It's actually the precision (the sign flip is a consequence of this). The TriclinicBox is only printing to 4dp so we don't see this. Still, same thing, i.e. a ~1e7 rounding error is causing the issue.)

Prior to the rounding fix we hadn't had any reported issues with triclinic spaces in the 2 years since Mark's problem.

I propose the following:

Let me know what you think. This is a tricky issue and I'm worried that we're just going to be bouncing around with quick fixes and workarounds that will keep tripping us up in future. I totally understand that it is confusing that the angles oscillate back and forth, but I'm not sure what we're meant to do when the vectors and angles are only stored to 5 to 7 dp in the input files.

I'll have another read of the reduction algorithm to see if there's anything else we could try.

lohedges commented 1 year ago

@jmichel80: If you want to to fix this on your working branch, i.e. until we figure out a better solution, then you can just apply this patch:

diff --git a/corelib/src/libs/SireVol/triclinicbox.cpp b/corelib/src/libs/SireVol/triclinicbox.cpp
index c4a06622..b9c5dd13 100644
--- a/corelib/src/libs/SireVol/triclinicbox.cpp
+++ b/corelib/src/libs/SireVol/triclinicbox.cpp
@@ -226,14 +226,9 @@ void TriclinicBox::construct(const Vector &v0, const Vector &v1, const Vector &v
        These constraints can be solved using simultaneous equations.
      */

-    // Perform the reduction. Add a small bias so that we always round in a consistent
-    // direction. This mitigates numerical errors due to box dimensions and angles, or
-    // lattice cell vectors being specified in fixed precision in the file formats that
-    // we support. Without these, repeated read/write conversion to different formats
-    // can cause the box angles to rotate back-and-forth.
-    this->v2 = this->v2 - this->v1*std::round(1e-6 + this->v2.y() / this->v1.y());
-    this->v2 = this->v2 - this->v0*std::round(1e-6 + this->v2.x() / this->v0.x());
-    this->v1 = this->v1 - this->v0*std::round(1e-6 + this->v1.x() / this->v0.x());
+    this->v2 = this->v2 - this->v1*std::round(this->v2.y() / this->v1.y());
+    this->v2 = this->v2 - this->v0*std::round(this->v2.x() / this->v0.x());
+    this->v1 = this->v1 - this->v0*std::round(this->v1.x() / this->v0.x());

     // Store the cell matrix and its inverse.
     this->cell_matrix = Matrix(this->v0, this->v1, this->v2).transpose();
lohedges commented 1 year ago

Note that it's not possible to adjust the bias so that the angles stop oscillating and OpenMM is happy with the reduced form, which would be the obvious fix.

jmichel80 commented 1 year ago

Generating the prm7 inputs with sire would require mods in Cresset's toolchain upstream of somd-freenrg, I don't think that's a good way forward.

I’m scratching my head a bit to figure out how to implement the massless dummy positional restraint scheme in SOMD described here https://github.com/openmm/openmm/issues/2262

My biggest headache is that SOMD’s design makes it difficult to have a mismatch between the number of atoms in the Sire system and the openmm layer.

@chryswoods , @lohedges
I was wondering if there is an easy way in sire to:

chryswoods commented 1 year ago

Yes - sire has full molecular editing capabilities, so you can load molecules, edit them (including adding/deleting atoms and residues) and then saving them back out. I'll put together a script that shows you how to do this later this morning. I guess you want the old API so this can drop straight into somd?

Also, just to say that this TriclinicBox issue is super subtle. Could there be a way that we add a flag that forces one behaviour or the other, e.g. a static function sire.vol.TriclinicBox.set_round_on_load(False) to globally disable the rounding fix? It's not perfect.

Or, we add an extra argument to the TriclinicBox constructor that turns on rounding, and then only set this to true when we know that we are loading the box from an input file that has reduced precision?

jmichel80 commented 1 year ago

I'm thinking about the following solution

pair of equilibrated prm7/rst7 files coming from Cresset's equiibration pipeline --> prepareFEP.py (version currently used) somd-freenrg input files --> addPositionalRestraints.py updated somd-freenrg input files --> somd-freenrg

addPositionalRestraints can be written using the new API, and will use the search functionality to find protein atoms within X angstrom of a ligand atom. Different variations will be tested (such picking only backbone atoms, picking every atoms in a residue if one atom is within cutoff etc...). It will also be possible to pass a precalculated list of atomic numbers to use to select atoms to apply positional restraints to.

For every restrained atom, a massless dummy atom (null nonbonded parameters) will be created using the same coordinates of the restrained atom. The dummy atoms will be appended to the system and an updated prm7/rst7 file will be written. A restraints.restr file will be written to save each pair of Atomic Numbers (restrained atom, matching dummy atom) and the harmonic force constant to use for the positional restraint.

With this info in place it should be reasonably straigthforward to update the current positional restraints code in somd-freenrg.

chryswoods commented 1 year ago

I'll write and upload a script now using the new API (just making coffee ;-)).

Also, @lohedges - the version numbers are right. Julien does see 2023.3.0.dev. The confusion is because he was showing the old code which was 2022.3.0.

chryswoods commented 1 year ago

It's taking a little longer than I thought as I hit a variety of small bugs that needed fixing. I am also taking the opportunity to remove some rough edges. It shouldn't be too much longer ;-)

chryswoods commented 1 year ago

I've create pull request #60 that has the small fixes needed to allow the below script to work.

import sire as sr

# load some molecules
mols = sr.load(sr.expand(sr.tutorial_url, "ala.top", "ala.crd"))

# get the molecule to edit
mol = mols[0]

# We will do this in two stages
#
# First, we will add the atoms themselves

editor = mol.edit()

n_to_add = 10
natoms = mol.num_atoms()

for i in range(0, n_to_add):
    # Add the atom and put it into the first
    # residue (update as needed if you want
    # this in a different residue)
    editor = editor.add(
        sr.mol.AtomName("Re")
             ).renumber(
               sr.mol.AtomNum(natoms + i + 1)
         ).reparent(
                sr.mol.ResIdx(0)
             ).molecule()

mol = editor.commit()

# Second, we will set the properties of the 
# atoms that we have added. We will do this using
# the new cursor syntax
cursor = mol.cursor()["atomname Re"]

# This cursor can be used to edit all of the "Re" atoms
# at once, or can be indexed to edit them individually

# We will iterate over them and set some default properties
for atom in cursor.atoms():
    atom["coordinates"] = sr.maths.Vector(0)
    atom["charge"] = 0 * sr.units.mod_electron
    atom["element"] = sr.mol.Element(0)
    atom["mass"] = 0 * sr.units.g_per_mol
    atom["atomtype"] = "DM"
    atom["LJ"] = sr.mm.LJParameter(1*sr.units.angstrom, 0*sr.units.kcal_per_mol)

    # etc...

# Adding atoms has invalidated the "parameters" property, so
# this should be removed
cursor = cursor.molecule()

if "parameters" in cursor:
    del cursor["parameters"]

mol = cursor.commit()

# print out all of the atoms and their properties
# just to validate they are correct
for atom in mol.atoms():
    print(atom, atom.property("charge"), atom.property("element"), atom.property("atomtype"), atom.property("LJ"))

# You can also add bonds to these atoms, e.g.
connectivity = mol.property("connectivity").edit()

# here we connect the first Re atom to the first atom in the molecule...
connectivity.connect(mol["atomname Re"][0].index(), 
                     mol.atoms()[0].index())

# also should set a bond potential (even if this is zero)
bonds = mol.property("bond")
bonds.set(mol["atomname Re"][0].index(),
          mol.atoms()[0].index(),
          0.0)

mol = mol.edit(
          ).set_property("connectivity", connectivity.commit()
          ).set_property("bond", bonds
          ).commit()

# Note that the above setting of bonds will become easier
# with the modernised editing API

print("\nBonds involving Re atoms")
print(mol.bonds("atomname Re"))

# Finally(!) we update the system with the new version of this molecule...
mols.update(mol)

# ...then we write this out to a new PRMTOP/RST file
f = sr.save(mols, "output", format=["PRMTOP", "RST"])

print(f"Output written to {f}")

# Reload the file to check everything is ok
mols = sr.load(f)

mol = mols[0]

print("\nLoad and double-check")
for atom in mol.atoms():
    print(atom, atom.property("charge"), atom.property("element"), atom.property("atomtype"), atom.property("LJ"))

print("\nBonds involving Re atoms")
print(mol.bonds("atomname Re"))

Note that a PRMTOP file stores LJ parameters as A and B values, so sigma will be set equal to zero on a load.

jmichel80 commented 1 year ago

Brilliant ! Hoping to try this out tonight. I think it might be easier to add the dummy atoms in a blank new molecule to avoid visualisation artefacts (as otherwise trajectories of protein residues will look weird unless the dummy atoms are deleted). It shouldn't be necessary to define a topology and bond these atoms to anything else.

chryswoods commented 1 year ago

Ok - building a molecule of only dummy atoms was a good edge case to expose some other bugs. I've fixed those and have pushed them to the same pull request (#60). Here's a script that can build and add a molecule full of dummies.

import sire as sr

mols = sr.load(sr.expand(sr.tutorial_url, "ala.top", "ala.crd"))

n_existing_atoms = mols.num_atoms()
n_existing_residues = mols.num_residues()

newmol = sr.mol.Molecule("dummies")

n_to_add = 10

editor = newmol.edit()

# Create a residue
editor = editor.add(
             sr.mol.ResName("Re")
         ).renumber(
             sr.mol.ResNum(n_existing_residues + 1)
         ).molecule()

for i in range(0, n_to_add):
    editor = editor.add(
        sr.mol.AtomName("Re")
             ).renumber(
               sr.mol.AtomNum(n_existing_residues + i + 1)
         ).reparent(
                sr.mol.ResIdx(0)
             ).molecule()

mol = editor.commit()

cursor = mol.cursor()["atomname Re"]

# need to set the properties to the correct type...
cursor[0]["charge"] = 1 * sr.units.mod_electron
cursor[0]["mass"] = 1 * sr.units.g_per_mol

for atom in cursor.atoms():
    atom["coordinates"] = sr.maths.Vector(0)
    atom["charge"] = 0 * sr.units.mod_electron
    atom["element"] = sr.mol.Element(0)
    atom["mass"] = 0 * sr.units.g_per_mol
    atom["atomtype"] = "DM"
    atom["LJ"] = sr.mm.LJParameter(1*sr.units.angstrom, 0*sr.units.kcal_per_mol)

mol = cursor.molecule().commit()

# This line is a bit janky as there isn't yet a "modern API" 
# way to do this
mols = mols.molecules()
mols.append(mol)

f = sr.save(mols, "output", format=["PRM7", "RST7"])

# load to check
mols = sr.load(f)

for atom in mols[-1].atoms():
    print(atom, atom.property("charge"), 
          atom.property("LJ"), atom.residue())

Part of the work to update to the modern editing API will be to clean up some of the jankiness above, e.g. fixing property type detection when zero values are used to initialise the whole property, and adding code to make it easier to add molecules to the sr.system.System object.

lohedges commented 1 year ago

Yes, adding a rounding option could work. That said, I'd really like to know if this is an issue rounding the original input that Cresset are using, or the input that they have generated with whatever version of prepareFep they are using, which would have had the previous (exact) reduction applied. This should be very easy to test and would provide very useful information, i.e. if everything is okay with the original files then we don't need to worry as much, since everything would be fine if their workflow was run with the new version of Sire. (We'd still have the problem of not being able to run existing input, though.)

We could also try adding a reduced representation check (like OpenMM) and only apply the reduction if it fails. However, this would still be prone to rounding errors and oscillation.

Thanks for the fixes @chryswoods. As you say, applying the correct unit for zero-valued properties would be really helpful. This tripped us up in BioSimSpace with the merging code and I imagine it could be a problem elsewhere too.

lohedges commented 1 year ago

Ah yes, well done me for not being able to read the year correctly 🤦‍♂️

jmichel80 commented 1 year ago

Great stuff @chryswoods . Seems to be doing what's needed. I found a small glitch.


(...)
for i in range(0,len(cursor)):
    atom = cursor.atom(i)
    restrained_at = mols["atomnum %i" % restrained_atoms[i] ]
    atom["coordinates"] = restrained_at.property("coordinates")
    atom["charge"] = 0 * sr.units.mod_electron
    atom["element"] = sr.mol.Element(0)
    atom["mass"] = 0 * sr.units.g_per_mol
    atom["atomtype"] = "DM"
    atom["LJ"] = sr.mm.LJParameter(1*sr.units.angstrom, 0*sr.units.kcal_per_mol)
    #paired_atoms.append( ( atom.number().value(), restrained_at.number().value() ) ) 

mol = cursor.molecule().commit()

mols.append(mol)

for atom in mols[-1].atoms():
    print(atom, atom.property("charge"),
          atom.property("LJ"), atom.residue())

f = sr.save(mols, "output", format=["PDB"])
f = sr.save(mols, "output", format=["PRM7", "RST7"])

# load to check
mols = sr.load(f)

for atom in mols[-1].atoms():
    print(atom, atom.property("charge"),
          atom.property("LJ"), atom.residue())

will produce the following output in my tests

Atom( Re:5511 [  36.35,   22.62,   25.42] ) 0  LJ( sigma = 1 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:5512 [  36.15,   23.92,   24.74] ) 0  LJ( sigma = 1 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:5513 [  35.48,   22.09,   26.31] ) 0  LJ( sigma = 1 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:5514 [  34.48,   22.70,   26.69] ) 0  LJ( sigma = 1 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:5515 [  34.75,   24.11,   24.12] ) 0  LJ( sigma = 1 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:5516 [  34.25,   23.22,   23.43] ) 0  LJ( sigma = 1 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:5517 [  37.25,   23.97,   23.67] ) 0  LJ( sigma = 1 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:18664 [  36.35,   22.62,   25.42] ) 0  LJ( sigma = 0 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:18665 [  36.15,   23.92,   24.74] ) 0  LJ( sigma = 0 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:18666 [  35.48,   22.09,   26.31] ) 0  LJ( sigma = 0 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:18667 [  34.48,   22.70,   26.69] ) 0  LJ( sigma = 0 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:18668 [  34.75,   24.11,   24.12] ) 0  LJ( sigma = 0 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:18669 [  34.25,   23.22,   23.43] ) 0  LJ( sigma = 0 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )
Atom( Re:18670 [  37.25,   23.97,   23.67] ) 0  LJ( sigma = 0 Å, epsilon = 0 kcal mol-1 ) Residue( Re:5511 num_atoms=7 )

So it looks like the atom numbers haven't been updated after the dummy molecule has been added to mols. But everything is fine if mols are saved to a prm7/rst7 and loaded again.

jmichel80 commented 1 year ago

Another glitch for the night

mols = mols.molecules()
mols.append(mol)

f = sr.save(mols, "output", format=["PRM7", "RST7"])

# load to check
mols = sr.load(f)

the code above in the example script doesn't preserve the space group of the loaded system.

for the saved prm7 files to work with the somd legacy amber parser we need a workaround to write a prm7 file with the IFBOX flag activated, and a %FLAG SOLVENT_POINTERS entry.

chryswoods commented 1 year ago

Thanks - sorry, I forgot that taking the molecules out of the system would lose the space info. I'll add the code to let you add the molecule directly to the system, so that the space info is preserved. I'll add this to the open PR.

chryswoods commented 1 year ago

I've added the code to the PR that adds the functionality to add and remove molecules directly from sire.system.System objects. This means that you can now add your dummy molecule by replacing

# This line is a bit janky as there isn't yet a "modern API" 
# way to do this
mols = mols.molecules()
mols.append(mol)

with

mols.add(mol)

I've also added functions to get and set the system space and time values, so you can check these via

print(mols.space())
print(mols.time())

or set them (not that you need to) using

mols.set_space(sr.vol.PeriodicBox(...))
mols.set_time(5 * sr.units.picosecond)

You can add or remove molecules from a sire.system.System using

mols.add( whatever molecules you want )
mols.remove( whatever molecules you want )

I've added aliases via the addition and subtraction operators, so you can also type

mols += mol
mols -= mol

mols = mols + mol
mols = mol + mols
mols = mols - mol
etc.
chryswoods commented 1 year ago

Also, the atom numbering issue was because I made a mistake in the above script.

for i in range(0, n_to_add):
    editor = editor.add(
        sr.mol.AtomName("Re")
             ).renumber(
               sr.mol.AtomNum(n_existing_residues + i + 1)
         ).reparent(
                sr.mol.ResIdx(0)
             ).molecule()

Renumbering should use sr.mol.AtomNum(n_existing_atoms + i + 1), not sr.mol.AtomNum(n_existing_residues + i + 1).

The reason the numbers changed when you saved and loaded the file was because the Amber PRM format doesn't actually save the atom numbers. This means they are lost on a save / load, and replaced with the atom index. Only a few file formats preserve atom numbers.

jmichel80 commented 1 year ago

brilliant ! This seems to work now. @mb2055 will push code shortly

jmichel80 commented 1 year ago

Is this ready for a PR ?

chryswoods commented 1 year ago

Is this now fixed? I think all of the code has been merged? We are nearly ready to release 2023.3.0, so I just want to make sure that there's nothing else that needs adding related to this issue?

jmichel80 commented 1 year ago

yes @mb2055 and @lohedges have tested the scripts here .