MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.26k stars 641 forks source link

XYZWriter odd behaviour #3756

Open rafaelpap opened 1 year ago

rafaelpap commented 1 year ago

Expected behavior

If I reorder some atoms I expect to get the right name along the right coordinates.

Original coordinates: O 0.00000 0.00000 0.00000 N 2.00000 0.00000 0.00000 H 0.00000 1.50000 0.00000

Expected coordinates after reordering atoms 1 and 2: O 0.00000 0.00000 0.00000 H 0.00000 1.50000 0.00000 N 2.00000 0.00000 0.00000

Actual behavior

I obtain: O 0.00000 0.00000 0.00000 H 2.00000 0.00000 0.00000 N 0.00000 1.50000 0.00000

The symbols are reordered but the coordinates remain unchanged.

Code to reproduce the behavior

import MDAnalysis as mda
u = mda.Universe("uno.xyz")
firstframe = 0
lastframe = 1
increment = 1
frames = slice(firstframe,lastframe,increment)
for ts in u.trajectory[frames]:
   idx_lst_sort = [0, 2, 1]
   ordenados = u.atoms[idx_lst_sort]
   print(u.atoms)
   print(u.atoms.positions)
   print(ordenados.atoms)
   print(ordenados.atoms.positions)
   with mda.Writer("dos.xyz", ordenados.n_atoms) as W:
       W.write(ordenados)
   with mda.Writer("dos.pdb", ordenados.n_atoms) as W:
       W.write(ordenados)

The uno.xyz file is:

3
frame 0 
       O     0.00000    0.00000    0.00000
       N     2.00000    0.00000    0.00000
       H     0.00000    1.50000    0.00000

The dos.xyz file:

3
frame 0 | Written by MDAnalysis XYZWriter (release 2.2.0)
       O     0.00000    0.00000    0.00000
       H     2.00000    0.00000    0.00000
       N     0.00000    1.50000    0.00000

The new order is not respected by XYZWriter while PDBWriter does.

The dos.pdb file is:

TITLE     MDANALYSIS FRAMES FROM 0, STEP 1: Created by PDBWriter
CRYST1    1.000    1.000    1.000  90.00  90.00  90.00 P 1           1
REMARK     285 UNITARY VALUES FOR THE UNIT CELL AUTOMATICALLY SET
REMARK     285 BY MDANALYSIS PDBWRITER BECAUSE UNIT CELL INFORMATION
REMARK     285 WAS MISSING.
REMARK     285 PROTEIN DATA BANK CONVENTIONS REQUIRE THAT
REMARK     285 CRYST1 RECORD IS INCLUDED, BUT THE VALUES ON
REMARK     285 THIS RECORD ARE MEANINGLESS.
MODEL        1
ATOM      1  O   UNK X   1       0.000   0.000   0.000  1.00  0.00      SYST O
ATOM      2  H   UNK X   1       0.000   1.500   0.000  1.00  0.00      SYST H
ATOM      3  N   UNK X   1       2.000   0.000   0.000  1.00  0.00      SYST N
ENDMDL
END

Current version of MDAnalysis

IAlibay commented 1 year ago

Looks like the issue here is that we use a Timestep object slice to write out coordinates instead of an atomgroup (whilst we do use the atomgroup to get the elements). So we lose re-ordering information when writing out positions.

I guess our best approach here is just to switch over to plainly using atomgroups for everything rather than relying on Timestep.

rafaelpap commented 1 year ago

If the change from Timestep to atomgroups is not trivial, maybe something like: MDAnalysis/coordinates> diff XYZ.py.ORIG XYZ.py

241c241
<             # update atom names
---
>             # update atom names & positions
242a243
>             ts.positions = atoms.atoms.positions

could work for the time been?

orbeckst commented 1 year ago

Is this behavior inconsistent between writers?

I only had time to look at XYZ. To me it looks as if we should do https://github.com/MDAnalysis/mdanalysis/blob/2c32ed11c1e6b800397f147e4242f7325900b669/package/MDAnalysis/coordinates/XYZ.py#L238 if we detect that the Atomgroup indices are not the same as the indices of the original universe.atoms. At the moment, the copy_slice() is only performed if we have fewer atoms.

I am not 100% sure if, for XYZ, there was any advantage to using a slice on a Timestep instead of using atoms.positions, which performs the same slicing operation https://github.com/MDAnalysis/mdanalysis/blob/2c32ed11c1e6b800397f147e4242f7325900b669/package/MDAnalysis/core/groups.py#L2775 but Timestep.copy_slice() should make sure that everything (forces, velocities, ...) is properly copied — this might not matter for XYZ but it matters for others. For consistency, we should use copy_slice() everywhere when necessary but use the original Timestep when possible (for performance).

rafaelpap commented 1 year ago

The behaviour is different between the XYZwriter and the PDB and GRO writers. I haven't tested the writers which produces binary output. PDB and GRO get the names and coordinates changed.

orbeckst commented 1 year ago

Thanks for checking @rafaelpap . I suppose we'll need to have a careful look around the code of the other writers to try to make it consistent.