SSAGESproject / SSAGES

Software Suite for Advanced General Ensemble Simulations
GNU General Public License v3.0
81 stars 28 forks source link

Issue with GetMasses and TotalMass? #23

Closed AMPsUSC closed 3 years ago

AMPsUSC commented 3 years ago

While I was validating the implementation of a new CV I printed the value of the atom masses. What I found is that the total mass seems to be calculated with the atom indices starting from 1 (thus matching the atom indices of the pdb file) while the individual mass of the atoms seems to be read with a -1 shift in the atom index with respect to the input pdb file. This is what I saw for my group 1 consisting of 2 N atoms:

(gdb) p masses[196]
$20 = (const __gnu_cxx::__alloc_traits<std::allocator<double> >::value_type &) @0xab0dd0: 14.006699562072754
(gdb) p masses[198]
$21 = (const __gnu_cxx::__alloc_traits<std::allocator<double> >::value_type &) @0xab0de0: 14.006699562072754
(gdb) p mtot1
$22 = 16.042999863624573

the value of mtot1 is exactly what I get by summing the atoms 196 and 198 from my pdb file (with the index of the first atom=1) while the individual masses of the atoms correspond to those taken with the index of the first atom in the pdb file =0).

mquevill commented 3 years ago

This is due to native indexing in C++ starting at 0. When you pass IDs into SSAGES, they start at 1. However masses[196] will actually contain the 197th element in the masses vector. Internally, GROMACS indexes from 0, which is corrected for in the GROMACS Hook that we have (see code below). It think it would probably be more proper to call masses[GetLocalIndex(196)]. On a single-processor job, this would just become masses[195], which is the 196th element. Hopefully this helps explain some of the inner workings of the code better.

https://github.com/SSAGESproject/SSAGES/blob/b2bf8edd536f2e79e942be69976119c5b30ce9ca/hooks/gromacs/GromacsHook.cpp#L91-L97

[Due to different indexing schemes in the various engines, sometimes there is a disconnect between indices in SSAGES and the engine, but the engine's Hook should account for these differences.]

mquevill commented 3 years ago

Furthermore, if you are running something via mpirun -np 4 ./ssages input.json, the atoms are going to be split among the various jobs, so masses[196] might be meaningless, depending on how many atoms each job is in charge of. For instance, masses[GetLocalIndex(196)] could be any number less than 196, depending on how all of the atoms are split up.

AMPsUSC commented 3 years ago

Thank you for the answer... I think I do not understand exactly what you mean... If I need to do calculations involving the individual masses of the atoms and the total mass of a group in the Evaluation function of my CV, which numbering should I use in the json file? in my function the right numbering for the individual masses do not supply the right total mass (obtained using the TotalMass function). In the lines I copied in my previous post the result of GetMasses and TotalMass do not seem to be compatible with each other...

mquevill commented 3 years ago

In .gro files, the index starts at 1. When you define "atom_ids": [196, 198], these should be the same atom numbers as in your .gro file. When you access these elements within SSAGES, they may have different internal indices, depending on your run conditions (e.g. -np in MPI). (If you were referencing them from within GROMACS, they also might have different indices!)

In your original post, you are accessing raw elements within the vectors of masses, which is not the best way to extract the masses. Instead of accessing a specific index, you should use GetLocalIndex(196) to get the index within SSAGES. The point of this is so that the developer (and user) will not have to figure out how the atoms were split up by MPI.

For example, if you call SSAGES with mpirun -np 2 ./ssages input.json with 200 atoms, GROMACS and MPI will probably split them into ~100 atoms each. But the numbering may or may not be contiguous. If they are split down the middle, GetLocalIndex(196) might be 95, but if they are split even/odd, it could have an index of 97. This is not always predictable, which is why we have the getter function to find it wherever it might be.

AMPsUSC commented 3 years ago

OK, thank you very much for the answer. Then I am not doing it correctly... I will check my code.