djay0529 / mdanalysis

Automatically exported from code.google.com/p/mdanalysis
0 stars 0 forks source link

DCD Unit cell angles different between MDAnalysis and VMD #187

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hello,

As discussed in the mailling list 
(https://groups.google.com/forum/#!msg/mdnalysis-discussion/C5TbkZLB2Eo/vyzcafrF
1AkJ), alpha and gamma angles are swapped between VMD and MDAnalysis when 
opening a DCD file (attached).

MDAnalysis
import MDAnalysis
u = MDAnalysis.Universe("sample.psf", "sample.dcd")
u.trajectory.ts.dimensions
  array([ 38.42659378,  38.39310074,  44.75979996,  60.02891541,  90. ,  90. ], dtype=float32)

VMD
vmd sample.psf sample.dcd
molinfo 0 get {a b c alpha beta gamma}  (Using TkConsole)
  38.426594 38.393101 44.759800 90.000000 90.000000 60.028915

The differences are caused by the indexing of the unitcell array when reading 
DCD files. In lines 732, 734, 738 and 740 of dcd.c 
(https://code.google.com/p/mdanalysis/source/browse/package/src/dcd/dcd.c#732), 
MDAnalysis associates alpha to index 1, while gamma is set to index 4. When 
looking at lines 01013, 01015, 01019 and 01021 of VMD's dcdplugin.c 
(http://www.ks.uiuc.edu/Research/vmd/plugins/doxygen/dcdplugin_8c-source.html#l0
0947), the same indexes are swapped (4 for alpha and 1 for gamma).
This problem became apparent when dealing with a non-orthorhombic box, since 
these exchanges make no difference for orthorhombic boxes.

Thanks

Original issue reported on code.google.com by caiobio...@gmail.com on 7 Jul 2014 at 4:44

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks Caio, I saw the same lines of code after I sent my email. To summarize 
the current findings and for future reference:

* DCD reader code 
https://code.google.com/p/mdanalysis/source/browse/package/src/dcd/dcd.c#725 
(which is based on the 2004 catdcd code which is, to my understanding, also 
(almost) the DCDplugin code):

    alpha = 90.0 - asin(unitcell[1]) * 90.0 / M_PI_2;
    beta  = 90.0 - asin(unitcell[3]) * 90.0 / M_PI_2;
    gamma = 90.0 - asin(unitcell[4]) * 90.0 / M_PI_2;

* DCD plugin of VMD (2013)

   http://www.ks.uiuc.edu/Research/vmd/plugins/doxygen/dcdplugin_8c-source.html#l00947

     ts->alpha = 90.0 - asin(unitcell[4]) * 90.0 / M_PI_2; /* cosBC */
     ts->beta  = 90.0 - asin(unitcell[3]) * 90.0 / M_PI_2; /* cosAC */
     ts->gamma = 90.0 - asin(unitcell[1]) * 90.0 / M_PI_2; /* cosAB */

Thus, the newer DCDplugin reverses angles in unitcell.

Unitcell array layout:

   unitcell           0  1     2    3     4      5
   MDAnalysis         A  alpha B    beta  gamma  C
   DCDplugin          A  gamma B    beta  alpha  C

TODO: Check how CHARMM itself does it.

Original comment by orbeckst on 7 Jul 2014 at 5:11

GoogleCodeExporter commented 9 years ago

Original comment by orbeckst on 7 Jul 2014 at 5:11

GoogleCodeExporter commented 9 years ago
Caio,

It seems quite likely that we need to fix MDAnalysis in the way you suggest. 
The difficulty is that the "DCD" format is not officially explicitly defined 
anywhere, at least not as far as I could find.

If at all possible I'd like to understand better which code saves unitcell 
information in DCDs and how. If you (or anyone else) has other examples or 
better insight into the question of how unitcell information is stored in "DCD" 
files then please add to this issue report.

To make matters worse (or "more interesting"), I just played around with the 
fairly recent (2012) CHARMM c36b2 and it turns out that the unitcell 
information that this version of CHARMM writes to a trajectory  are the unit 
cell vectors in a reduced representation (the box matrix is symmetric so only 6 
reals are saved in what used to be box lengths and angles) (see below).

It seems that this part of the MDAnalysis code needs a closer look. I won't be 
able to do more on this over the next couple of days, though, so everyone is 
more than welcome to chime in.

Oliver

UNITCELL INFORMATION IN CHARMM c36b2
------------------------------------

For the record, I'm attaching the c36b2 output and the CHARMM script to 
generate it:

   charmm -input tricdyn.inp

I defined a unitcell

  CRYSTAL DEFINE TRIC  35 35 35  90.0 60.0 45.0

with three different angles. The NPT dynamics runs fine for 10 ps (10,000 
steps). I save every 1 ps.

- CHARMM reports the angles as above during 10 ps dynamics (including box 
deformations)::

    DYNA>        0      0.00000   -658.17764    235.26610   -893.44374    316.97801
    Crystal Parameters : Crystal Type = TRIC
      DYNA A     =   35.00000 B    =   35.00000 C     =   35.00000
      DYNA Alpha =   90.00000 Beta =   60.00000 Gamma =   45.00000

    DYNA>     1000      1.00000   -867.42408    225.53867  -1092.96275    303.87208
    Crystal Parameters : Crystal Type = TRIC
      DYNA A     =   35.44604 B    =   35.06156 C     =   34.15850
      DYNA Alpha =   91.32802 Beta =   61.73521 Gamma =   44.40703

    DYNA>    10000     10.00000   -941.98187    226.24620  -1168.22807    304.82535
    Crystal Parameters : Crystal Type = TRIC
      DYNA A     =   31.99748 B    =   30.21518 C     =   35.24292
      DYNA Alpha =   95.85821 Beta =   71.08429 Gamma =   31.85939

- VMD 1.9.1 cannot read the box information ::

    vmd > pbc box
    -error Suspicious pbc angle (alpha=-3.768419 beta=7.020089 gamma=16.413187).
    domain error: argument not in valid range

- MDAnalysis::

    In[3]:  u.trajectory[0].dimensions
    Out[3]: array([ 30.84183502,  31.78008842,  32.67008972,  14.57863426, 9.62632275,  -2.60815024], dtype=float32)

  We see the swapping of the alpha and gamma values between VMD 1.9.1 and MDAnalysis 0.8.2-dev but this does not look like a sensible unit cell.

It seems that this is a new version of the CHARMM traj format that stores the 
symmetric matrix instead of the unit cell information because I can get 
CHARMM's unit cell information back by picking the right numbers:

  u = MDAnalysis.Universe("./output/tip125.psf",    "./output/tric.dcd")
  u.trajectory[0]   # at t = 1 ps
  H = u.trajectory.ts._unitcell
  e1, e2, e3 = H[[0,1,3]],  H[[1,2,4]], H[[3,4,5]]
  print(MDAnalysis.coordinates.core.triclinic_box(e1,e2,e3))
  # array([ 35.44603729,  35.06156158,  34.15850067,  
  #         91.32801819,  61.73519516,  44.4070282 ], dtype=float32)

  u.trajectory[-1] # at t = 10 ps
  e1, e2, e3 = H[[0,1,3]],  H[[1,2,4]], H[[3,4,5]]
  print(MDAnalysis.coordinates.core.triclinic_box(e1,e2,e3))
  # array([ 31.9974823 ,  30.21518326,  35.24291992,  
  #          95.8582077 ,  71.08429718,  31.85938454], dtype=float32)

This agrees with the CHARMM output in the log file (note that the trajectory 
does not contain a frame at t=0, the first frame is at t=1 ps).

Original comment by orbeckst on 7 Jul 2014 at 9:23

Attachments:

GoogleCodeExporter commented 9 years ago
PS to comment #3: forgot to attach the water coordinate file that one needs to 
run the CHARMM script.

Original comment by orbeckst on 7 Jul 2014 at 9:26

Attachments:

GoogleCodeExporter commented 9 years ago
Hi, Oliver
I've made an exhaustive search about how the cell information is stored and 
found nothing new. My conclusion is that, like Axel Kohlmeyer said in the NAMD 
mailing list, there is no standard for this. Each software saves cell data in a 
different manner, as shown by your example using CHARMM 36b2.
Also, Axel claims that VMD's dcdplugin.c reads more "types" of DCDs than any 
other package.

Original comment by caiobio...@gmail.com on 8 Jul 2014 at 5:47

GoogleCodeExporter commented 9 years ago
Hi Caio,

I think we should update the dcd code to the latest one from the DCD molplugin; 
I can't remember how to get the sources, though (even though the UIUC licence 
is GPL compatible) – it would be easier if we had a nice Python interface to 
molfile (Issue 55)....

It would also be interesting to know if one can distinguish different DCDs from 
the header information.

Anyway, I won't be able to work on this over the next week or so. In the 
meantime, as a quick fix for you can change in MDAnalysis/coordinates/base.py 
and DCD.py [0,2,5,1,3,4] to [0,2,5,4,3,1].

Oliver  

Original comment by orbeckst on 9 Jul 2014 at 2:09

GoogleCodeExporter commented 9 years ago

Original comment by orbeckst on 17 Oct 2014 at 11:26

GoogleCodeExporter commented 9 years ago

Original comment by orbeckst on 3 Feb 2015 at 11:27

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 2ebcae2369a4.

Original comment by orbeckst on 18 Feb 2015 at 7:39

GoogleCodeExporter commented 9 years ago
I think I properly fixed Issue 187 but please take a good look at it (revision 
2ebcae2369a4).

For the future we have to think how to properly deal with the various "DCD" 
formats, namely CHARMM/NAMD/LAMMPS DCD trajectories. 

The following is roughly what I understand at the moment (but might not be 
entirely correct).  h1, h2, h3 are the box vectors.

======== ========== =========================== ========= =========
code     version    unitcell                    time      length
======== ========== =========================== ========= =========
CHARMM     <22 ?    A,B,C,cos(α),cos(β),cos(γ)  AKMA       Å
           ≥22 ?    h1, h2, h3 (symmetric)      AKMA       Å
NAMD       < 2.5    A,B,C,α,β,γ (in degrees)    ps         Å
           ≥ 2.5    A,B,C,cos(α),cos(β),cos(γ)  ps         Å
LAMMPS              ?A,B,C,cos(α),cos(β),cos(γ) ps         Å
                    ?                           user-def  user-def
======== ========== =========================== ========= =========

The LAMMPS.DCDReader can be used for NAMD trajectories with time_unit="ps" (the 
default). The main problem, though, is that I don't know how to reliably 
autodetect any of these format variations. The DCD C code is able to 
distinguish angle-cosines for angles (in degrees) and the DCD 
Timestep.dimensions will be able to guess in most cases if the unitcell 
represents box matrix vectors (Btw, this could probably be moved into DCD.c). 
However, I know of no reliable way to detect that a DCD was written by CHARMM 
or NAMD so that we could automatically get the time units right.

Original comment by orbeckst on 18 Feb 2015 at 7:57