MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.28k stars 642 forks source link

Gromacs TPR file `u.atoms.ids` starts from 0 #4450

Open HankewieDanke opened 6 months ago

HankewieDanke commented 6 months ago

Expected & actual behavior

Within GROMACS TPR files the atom ids are continuous and starting from 1 (as far as I am aware -- that is the way it works in gromacs index files at least). When reading in a mda.Universe('*.tpr'), I would expect atom ids starting from 1 as well. Instead they start from 0 -- reflecting the u.atom.indices.

While not necessarily a breaking bug, this is something that would improve user experience/ prevent errors when forgetting adding 1 to the atom ids when reading in tpr files.

Code to reproduce the behavior


import MDAnalysis as mda
u = mda.Universe(TPR)
print(u.atoms.ids[0])
u = mda.Universe(GRO)
print(u.atoms.ids[0])

Current version of MDAnalysis

IAlibay commented 6 months ago

Interesting, I thought this was already fixed with https://github.com/MDAnalysis/mdanalysis/issues/2364

@lilyminium as the person who implemented the original fix, could you weigh in here please?

orbeckst commented 5 months ago

@IAlibay I don't think that atomids were addressed #2364, which was about resids.

@HankewieDanke GROMACS TPR files internally number from zero, as you can see with gmx dump -s TPR

   moltype (0):
      name="AKeco"
      atoms:
         atom (3341):
            atom[     0]={type=  0, typeB=  0, ptype=    Atom, m= 1.40067e+01, q=-3.00000e-01, mB= 1.40067e+01, qB=-3.00000e-01, resind=    0, atomnumber=  7}
            atom[     1]={type=  1, typeB=  1, ptype=    Atom, m= 1.00800e+00, q= 3.30000e-01, mB= 1.00800e+00, qB= 3.30000e-01, resind=    0, atomnumber=  1}
            atom[     2]={type=  1, typeB=  1, ptype=    Atom, m= 1.00800e+00, q= 3.30000e-01, mB= 1.00800e+00, qB= 3.30000e-01, resind=    0, atomnumber=  1}

Therefore, we make these atom indices available "as is". In #2364 we had a long discussion that for resids we make the 1-based indices available. I think you're suggesting to do the same for atom ids.

Notes

  1. In https://github.com/MDAnalysis/UserGuide/wiki/Topology-attributes#default-values-for-attributes @lilyminium notes that ''by default'' the ''ids'' attribute is a ''continuous sequence from 1 to n_atoms''. However, for TPR we do not adhere to the default but provide what's in the file.
  2. When we write a gromacs index NDX selection file u.atoms.write("index.ndx") we do add +1 to the atom indices (atom.index + 1)... so we are aware that GROMACS tools want 1-based atoms.

@lilyminium @jbarnoud @IAlibay (and anybody else) do we want to also make atoms.ids 1-based for TPR files? We could add it to 3.0 as a breaking change.

HankewieDanke commented 5 months ago

Dear @IAlibay and @orbeckst,

Thank you for the responses. Then it was a mistake on my part, assuming that the TPR file starts from 1, apologies.

Still to me this behavior seems misleading, as human readable GMX files start from one. atomid from one would also be consistent with the resid behaviour for TPR files (after #2364). For a consistent experience in usage with other tools (and within MDA) I personally think it may be sensible to use the same for the atom ids with TPR files. From the documentation I always believed that id is indexed from one and index from zero.

I came upon this, while preparing plumed files. Which also start atom ids from one (just to add an other example).