dlpoly topology reader (c++ part) does not set residue names correctly

GoogleCodeExporter commented 8 years ago

See  issue #139 , comment 8.

Basically DLPOLYTopologyReader::ReadTopology doesn't transfer the residuum from 
the fortan to c++.

todo in the code:
// TODO: fix residue naming / assignment
Residue *res = top.CreateResidue("no");

I am not sure where in the c structures this is stored. A bit of documentation 
in dlpoly/dlp_io_layer.h would be helpful!

Original issue reported on code.google.com by christop...@gmail.com on 7 Nov 2013 at 10:18

GoogleCodeExporter commented 8 years ago

Hands-on clarification, while I am still reading this:

There is no ResId or Residue name in DL_POLY FIELD (or in any other file). 
Previously, DL_POLY-Classic (up to v-2.0) had so-called "neutral group" id 
(integer number), but it has been dropped in the older versions, now considered 
obsolete.

To have support for both, I implemented FieldSiteT->idmol and 
FieldSiteT->idgrp, to provide MoleculeID (integer) and GroupID (integer) for a 
site within a molecular type, i.e. what is read from FIELD, as the structure 
name hints on - FieldSiteT - i.e. a site specs read from FIELD. 

I have to check if idgrp=1 or idgrp=idmol, if there is no distinction between 
groups with the same molecule.

Original comment by abruk...@googlemail.com on 7 Nov 2013 at 10:42

GoogleCodeExporter commented 8 years ago

Adding here what was originally contained in my small test code (a bit of 
documentation on the structures used):
<<<
  // AB: the parameter meanings in the data structures are self-evident, below only the least evident ones are described

  // AB: FrameSiteT.id - index of site in the entire system (following the order of appearance in a frame)
  // AB: FrameSiteT.im - index of site in a molecule              (same as FieldSiteT.idmol)
  // AB: FrameSiteT.ig - index of site in a neutral/charged group (same as FieldSiteT.idgrp)

  // AB: FieldSiteT.ifrzn = 0/1 - determines if the site must be "frozen" (must not move at all)
  // AB: FieldSiteT.nrept - number of subsequent repetitions of the site in a molecule

  // AB: FieldSpecsT.ineut = 0/1 - determines if neutral/charged groups are present (0 => molecule=1group)

  // AB: FrameSpecT.keytrj - determines the level of trajectory data:
  // AB: FrameSpecT.keytrj = 0 => only coordinates are present
  // AB: FrameSpecT.keytrj = 1 => only coordinates & velocities are present
  // AB: FrameSpecT.keytrj = 2 => coordinates, velocities & forces are present

  // AB: FrameSpecT.imcon - determines the type of PBC image convention:
  // AB: FrameSpecT.imcon = 0  => no PBC (no boundaries)
  // AB: FrameSpecT.imcon = 1  => cubic
  // AB: FrameSpecT.imcon = 2  => orthorhombic
  // AB: FrameSpecT.imcon = .. => etc. (see DL_POLY manual)
>>>

In general, the names of structures reveal their source, whereas the variable 
names reveal their meaning and usage. "Frame" stands for both CONFIG and/or 
HISTORY, as there data are essentially the same - the entire system specs that 
can be found in those files. "Field" stands for the data from FIELD file, and 
can be expanded into the force-field specs for the entire system, by looping 
over <Name>.nrept - number of repetitions of the corresponding entity (molecule 
or site). 

Please, tell me (or comment here) if anything is not clear yet.

Original comment by abruk...@googlemail.com on 7 Nov 2013 at 11:04

GoogleCodeExporter commented 8 years ago

The naming scheme with the residue 1:resname:atomname is currently defined in 
the topology reader byvand not globally enforced, see e.g. the 
pdbtopologyreader which directly uses the atomname from the pdb since residue 
information is not available (at least with the reader we use, so this might be 
a separate issue for fixing).

However pdb topologies rely on the fact that the atom naming within the 
molecule are unique. What would be the most consistent - within a molecule 
unique - bead naming scheme for dlpoly users? In general I don't like only atom 
indices since these are very error prone. One has to be aware that these names 
have to be specified in the mapping file, so they should be identical for atoms 
in different molecules of the same molecule type.

The line in the dlpoly reader for that was just a quick hack till we sort out 
the best naming scheme
 nm << bead->getResnr() + 1 << ":" <<  top.getResidue(bead->getResnr())->getName() << ":" << bead->getName();

Original comment by victor.r...@gmail.com on 7 Nov 2013 at 11:36

GoogleCodeExporter commented 8 years ago

I did/would not aim (and not going to) at reading PDB format, as it is very 
ambiguous, because over the years people from different backgrounds started 
using it in different ways and now it became essentially "use it as you want" 
(de facto)...

As for naming used by DL_POLY, it is in fact different from Gromacs, and I 
mentioned it before. The atom names in DL_POLY formats are not unique, and the 
proper distinction between unique atoms is only done by their indices. This is 
not my invention and I won't defend it, but it has its advantages too. So, 
DL_POLY's atom names are actually equivalent to Gromacs' atom types. 

There is an ad-hoc way do have unique names for atoms within a molecular type 
(a molecule instance in FIELD; these names would be the last column within the 
"atoms" specs in FIELD), but I am not sure if it is any sort of standard, most 
likely not (it's not imposed by the internal readers, nor by the manual).

Original comment by abruk...@googlemail.com on 8 Nov 2013 at 12:01

GoogleCodeExporter commented 8 years ago

Ok, let's leave it ResidueName="no" for now. We can always reopen this issue if 
a user complains.

Original comment by christop...@gmail.com on 8 Nov 2013 at 12:18

Changed title: dlpoly topology reader (c++ part) does not set residue names correctly
Changed state: WontFix

Pallavi-Banerjee21 / votca

dlpoly topology reader (c++ part) does not set residue names correctly #142