MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.31k stars 648 forks source link

LAMMPS.DATAWriter and bond types etc. #763

Closed khuston closed 12 months ago

khuston commented 8 years ago

I would like to write a LAMMPS.DATAWriter. Here is a use case:

u = mda.Universe('polymer.data', 'coil.dcd')
for ts in u.trajectory:
    ... analyze the trajectory, decide which frame I'd like to write out as a data file ...
    if save_this_frame:
        with mda.Writer('configuration_xx.data') as W:
            W.write(u.atoms)

There are a few concerns I can think of:

  1. I would need to parse bond, angle, dihedral, and improper types with DATAParser.
  2. Would these types be stored as Bondtypes, Angletypes, Dihedraltypes, Impropertypes?
  3. Would the current bond.type etc. that returns an array of the constituent atom types be overridden? Or would these be some separate attribute of bonds, angles, dihedrals, and impropers?
  4. LAMMPS data files can also have numerous coefficients sections. In principle someone might want to read that information and write it again to the DATA file later. Personally, I don't use those sections and would happily ignore them.
  5. If someone tries to write a LAMMPS data file based on a topology which doesn't have Bondtypes, Angletypes, Dihedraltypes, Impropertypes, should an error result?

I've tried to catch up with the issue-363 discussion, but I've probably missed a lot, so I apologize if this is misguided.

richardjgowers commented 8 years ago

Hey Kyle

This would be awesome.

I think you're right that you'll need to override the current .type behaviour somehow.

The current 363 behaviour for atoms is something like sticking whatever we find in a topology file to atoms. It would be nice to extend this philosophy to bonds too.. but I'll have to think about what this would mean in total.

In the short term, you should be able to read the bond types into a dictionary of {(atom indices): bond type}, and then write a Writer which accepts this. I think once we've got a prototype Writer it'll be much clearer what we need to make it work seamlessly.

So tl;dr 1 - yes, but you can do a hack job in the short term 2 - ideally in the long run stuck to the bond objects 3 - same as above 4 - eventually it'd be nice to squeeze everything, but lets just get this working 5 - should error if the resulting file doesn't work (NoDataError: No bond types specified, specify them by ...)

Maybe hack together a Writer and we'll see what needs doing

khuston commented 8 years ago

In order to write a LAMMPS.DATAWriter, I first need explicit typing of bonds, angles, dihedrals, and impropers beyond the tupling of atoms types done now. To hack bond types into the topology, I branched issue-363 and tried the following:

  1. Add optional types keyword to topologyattrs.Bonds. This is stored in the class instance a self._bondtypes. A topology parser can pass types to the Bonds topologyattr.
  2. When topologyattrs.Bonds is called upon to generate a TopologyGroup, these values are passed on via a new keyword in TopologyGroup.__init__(). They are stored in the TopologyGroup instance as self._bondtypes.
  3. I modified TopologyGroup.__getitem__ to yield a Bond with attribute _bondtype. If _bondtype is not None then it overrides the default behavior of bond.type.
  4. I modified TopologyGroup() calls wherever I could find them to pass on _bondtype in the instantiation.

I found that the modified DATAParser in issue-363 is slightly broken, because the atom types are saved as integers which breaks select_atoms (I think it expects strings, and typing the types as strings on parsing fixes this issue). It seems there is currently no test for atom selection from LAMMPS DATA files, so this might have gone undetected.

An error is raised for ag.bonds for an atom group that doesn't have bonds, but this seems to also be the case for branch issue-363 right now, so I didn't try to fix this.

My changes seem to work, but I'm unable to run the testsuite, so there are likely problems I'm not seeing.

When in the directory mdanalysis/testsuite/MDAnalysisTests/data/lammps, I get the following sample usage, which is all correct:

In [1]: u = mda.Universe('datatest.data')

In [2]: u.atoms.bonds.types()
/usr/local/lib/python2.7/site-packages/numpy/lib/shape_base.py:431: FutureWarning: in the future np.array_split will retain the shape of arrays with a zero size, instead of replacing them by `array([])`, which always has a shape of (0,).
  FutureWarning)
Out[2]: ['1', '3', '2', '5', '4', '7', '6']

In [3]: u.atoms[10].bonds.types()
Out[3]: ['3']

In [4]: u.atoms[10].bonds[0].type
Out[4]: '3'

In [5]: u.atoms.angles.types()
Out[5]: ['1', '3', '2', '5', '4', '7', '6', '9', '8']

In [6]: u.atoms.dihedrals.types()
Out[6]: ['1', '3', '2', '5', '4']

In [7]: u.atoms.impropers.types()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-85a37d7e9743> in <module>()
----> 1 u.atoms.impropers.types()

AttributeError: 'AtomGroup' object has no attribute 'impropers'

In [8]: u.atoms[10].angles
Out[8]: <TopologyGroup containing 3 angles>

In [9]: u.atoms[10].angles.types()
Out[9]: ['9', '2', '7']

Unfortunately I seem to have broken bonds for everything other than lammps data files, because I now get AttributeError: 'AtomGroup' object has no attribute 'bonds', so I'll have to fix that before going any further.

richardjgowers commented 8 years ago

Hey Kyle,

That all sounds sane.

ag.bonds will fail if there's no bonds, so that's expected.

If you push your repo and link me I'll take a look through

Richard

khuston commented 8 years ago

I just put it in a pull request #772