Automatic box translation in MDAnalysis 2.4 breaks re-centering operations during simulation

amirhs1 commented 1 year ago

Expected behavior

This data file attached below is a topology file (in MDAnalysis lingo) for a single ring polymer in cylindrical simulation box. The polymer in the input data file (test_already_recentered_inLAMMPS.data.txt) is already centered at the origin (the center of geometry (COG) of the polymer is at the center of box at (0, 0, 0)), with x, y, and z values ranging from $-r{cyl}$ to $+r{cyl}$ for x and y axes, and from $-l{cyl}/2$ to $+l{cyl}/2$ for the z axis where $r{cyl}$ and $l_{cyl}/2$ are the radius and length of the cylindrical simulation box. When reading the data file with MDAnalysis, the positions of the atoms should remain unchanged.

Background

The fix recenter command in LAMMPS and similar commands in other MD packages constrain the center-of-mass position of a group of atoms (an AtomGroup in MDAnalysis lingo) by adjusting the coordinates of the atoms every timestep, which helps the user later conveniently apply some measurement on a trajectory file; moreover, this option "can be used to ensure the entire collection of atoms (or a portion of them) do not drift during the simulation due to random perturbations (e.g. fix langevin thermostatting)." (from LAMMPS documentation).

What is the problem in MDAnalysis?

In version 2.3, MDAnalysis reads the coordinates as they were and the user could apply transformation if needed. However, in version 2.4, I have noticed that MDAnalysis automatically translates the box to the origin, which destroy all the re-centering operations done during a simulation. Please see the implementation of DumpReader in version 2.3 here (the last lines of the DumpReader class at end of the html page) and compare it with this line in version 2.4.

Actual behavior

When running the code with MDAnalysis version 2.4.3, the resulting test_written_by_mda.data file shows that the polymer is shifted to the center of the box at $(r{cyl}, r{cyl}, l{cyl}/2)$ not $(0, 0, 0)$. The x, y, and z values range from $0$ to $+2r{cyl}$ for x and y axes, and from $0$ to $l_{cyl}$ for the z axis. The re-centering and re-defining ranges of coordinates cause extra costly and slow transformations in Python, especially for large files (~ 5GB); for example, with this new approach, you cannot easily measure the distribution of atoms around the COG of the polymer; you must first perform some costly transformations.

Code to reproduce the behavior

Download the following files:
- test_already_recentered_in_LAMMPS.data.txt (topology file)
- test_already_recentered_in_LAMMPS.lammpstrj.txt (associated trajectory file)
Use the following Python code to reproduce the behavior:

import MDAnalysis as mda

# drop the ".txt" from the enda of attached files before running this code
topo = "./test_already_recentered_in_LAMMPS.data"
trj = "./test_already_recentered_in_LAMMPS.lammpstrj"
u = mda.Universe(topo, trj, topology_format='DATA', format='LAMMPSDUMP', lammps_coordinate_convention='unscaled',
        unwrap_images=False, atom_style="id resid type x y z")

for ts in u:
        with mda.Writer("test_written_by_mda.data") as W:
            W.write(u.atoms)

Current version of MDAnalysis

Which version are you using? 2.4.3
Which version of Python ? 3.11.3
Which operating system? macOS Big Sur Version 11.7.4

hmacdope commented 1 year ago

Perhaps we should make the recentering an optional kwarg for the DumpReader with default=False, @MDAnalysis/coredevs agree? I am happy to do it.

orbeckst commented 1 year ago

These changes were part of PR #3844 and were meant to fix #3741 IIRC. How would default=False interact with the concern of #3741 ?

hmacdope commented 1 year ago

Good point, we could do default=True then but if you have done something special like OP then perhaps you can optionally move the coordinates. Does that sound more reasonble?

orbeckst commented 1 year ago

I think so. At least we don't break the behavior again. Do our MDA PBC things work with negative coordinates? In any case, if it's opt-in and we add a warning to the docs then I am in favor of giving power-users more options and allowing reading of data as stored.

Parenthetically, this seems to be part of a wider discussion about the limitations of our box representation (a, b, c, alpha, beta, gamma). Arbitrary box vectors would contain more information, including the orientation of the whole system for spatial superposition. However, arbitrary box vectors would not solve this problem as we would need to also store the coordinates of the origin.

MDAnalysis / mdanalysis