Open sathishdasari opened 5 years ago
When you call SSAGES without mpirun
/mpiexec
, you are only spawning one process. GROMACS sets OpenMP threads internally. There is code within GROMACS that will help choose the number of threads if unspecified. By default, it will try to use all available threads (8 in your case). However, each thread may not use 100% of the core, based on GROMACS's optimizations. For example, on my workstation, each core is only at ~82%, which is why using the percentage from top
may appear that it is only using 3 CPUs. If you call top -H
, each process will show its threads separately, so this should show 8 lines of ssages
.
GROMACS attempts to optimize the threads and ranks for your simulation; this error comes from GROMACS's attempts to optimize running parameters. One rank with 24 threads is often less efficient with GROMACS. For this, I would suggesting using multiple ranks. To specify this, change ssages 1walker.json
to mpirun -np 4 ssages 1walker.json
, to use 4 ranks, for example. This will use the MPI capabilities of GROMACS natively. [If, however, you would actually like to use 24 OpenMP threads, you can specify "-ntomp","24"
within the "args"
member of the .json file.]
This is good to hear. In this case, you are specifying 24 MPI ranks, so GROMACS only assigns 1 OpenMP thread per rank.
Currently, there is no criterion or indicator of convergence built into SSAGES. The development team has discussed various ways to do this, and is currently in-progress. To extend the simulation, add a JSON member to the method: "restart": true
, which will read the files from the last run and continue from there. (If "restart"
is false or unspecified, then the old files will be backed up once the new files are written.)
You can set up a Logger that will print the CVs as the simulation proceeds. (Manual > Input Files > Simulation Properties > Logger) This can be helpful to track other CVs, while only sampling over a few. See below for the syntax:
"logger": {
"frequency": 100,
"output_file": "cvs.dat",
"cvs": [0, 3]
}
If you have any further questions, please let us know!
Thank you very much for your suggestions.
Dear Sir,
[mm3:06753] *** Process received signal ***
[mm3:06753] Signal: Segmentation fault (11)
[mm3:06753] Signal code: Address not mapped (1)
[mm3:06753] Failing at address: 0x428
The JSON member "output_file"
can take an array of strings. For two walkers, for example, you can use this:
"output_file": ["cvs_w0.dat", "cvs_w1.dat"]
Do you get this error right at the beginning of the simulation? Or does the simulation start and the error occurs somewhere in the middle of the simulation? I have been able to restart the included 2 walker ADP example without a segmentation fault. Make sure that you are restarting a simulation with the same details (method parameters, number of walkers, etc.). If you have changed something about the method, then the software might have incorrect data when trying to read the files in.
Thank you.
I was trying a 2 walker simulation of ADP in solvent. After some time the job was killed, displaying the following error.
*** Error in `ssages': free(): invalid pointer: 0x00000000012dc3e0 ***
Dear Sir, When I was trying to extend a 2 walker simulation, it was displaying following error.
*** error in `ssages': corrupted size vs. prev_size: 0x00000000025c24d0 ***
I'm afraid that these error messages aren't enough to help diagnose your problem. If there is more output surrounding these error messages, please copy as much as is relevant.
Or if your issue is reproducible, you can attach the files needed to run your simulation so that the development team can try to reproduce your issue. This way, we can try to debug whatever is happening in this system.
Dear Sir, I could not share the files as the file size is more than 10MB. I just changed the args in 2walker.json file from
"args" : ["-s","-deffnm","adp"],
to
"args" : ["-s","-deffnm","adp","-cpi", "adp", "-append"],
and added
"restart" : true,
to the .json file. I used the following command to run the simulation on a system consisting of 2 sockets, 6 cores per socket, 2 threads per core (2x6x2=24 CPUs)
mpirun -np 24 ssages 2walker.json &
I am getting the following error:
*** Error in `ssages': corrupted size vs. prev_size: 0x0000000001e824a0 ***
[ccl2:22785] *** Process received signal ***
[ccl2:22785] Signal: Aborted (6)
[ccl2:22785] Signal code: (-6)
Dear Sir, 1) I am trying to run a one walker ABF on ADP system. My system is consisting of 1 socket, 4 cores per socket and 2 threads per core (1x4x2=8 CPUs). But when I run the job using "ssages 1walker.json" it uses only 3 CPUs (from top command %CPU). How can I use all CPUs to get good performance?
2) When I run the same job on a system consisting of 2 sockets, 6 cores per socket, 2 threads per core (2x6x2=24 CPUs) it gives the following error:
3) 2 walker job is running perfectly fine with full efficiency with the command
mpirun -np 24 2walker.json
on this system.4) How to know the convergence of ABF method using this software? Does the simulation terminate automatically after it converges? If not, how to extend the simulation using this software?
5) How to extract the structures of the free energy minima from the trajectory? As we do not have any file which provides collective variable values to be printed along the simulation time, which is helpful to get the frame numbers in extracting the structures corresponding to particular minimum. Like COLVAR file in PLUMED Software.