SSAGESproject / SSAGES

Software Suite for Advanced General Ensemble Simulations
GNU General Public License v3.0
81 stars 28 forks source link

Cannot set restart FFS example when "computeInitialFlux" set as "false" #14

Open yacexi opened 5 years ago

yacexi commented 5 years ago

When I set "computeInitialFlux" to "false" and run the "/ForwardFlux/LAMMPS/Langevin" example, the simulation fails. The example has been successfully run with that parameter set to "true", and there are l0*.dat files in "FFSoutput" folder. The error messages are shown as follow:

Step Temp KinEng PotEng TotEng c_xx c_yy c_ffx c_ffy c_ee Press 
[c5:03482] [ 0]        0            0            0            0            0        -1.01            0     0.081204            0   0.00040401            0 
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f1969670340]
[c5:03482] [ 1] [c5:03483] *** Process received signal ***
/home/yacexi/software/SSAGES-public/build/ssages(_ZN6SSAGES17DirectForwardFlux15InitializeQueueEPNS_8SnapshotERKSt6vectorIPNS_18CollectiveVariableESaIS5_EE+0x231)[0x51c8c1]
[c5:03482] [ 2] [c5:03483] Signal: Segmentation fault (11)
[c5:03483] Signal code: Address not mapped (1)
[c5:03483] Failing at address: 0x195954180
/home/yacexi/software/SSAGES-public/build/ssages(_ZN6SSAGES17DirectForwardFlux15PostIntegrationEPNS_8SnapshotERKNS_9CVManagerE+0x10e)[0x51b47e]
[c5:03482] [ 3] /home/yacexi/software/SSAGES-public/build/ssages(_ZN6SSAGES4Hook19PostIntegrationHookEv+0xe9)[0x4c0839]
[c5:03482] [ 4] [c5:03483] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f86c1f44340]
[c5:03483] [ 1] /home/yacexi/software/SSAGES-public/build/ssages(_ZN6SSAGES17DirectForwardFlux15InitializeQueueEPNS_8SnapshotERKSt6vectorIPNS_18CollectiveVariableESaIS5_EE+0x231)[0x51c8c1]
[c5:03483] [ 2] /home/yacexi/software/SSAGES-public/build/ssages(_ZN6SSAGES17DirectForwardFlux15PostIntegrationEPNS_8SnapshotERKNS_9CVManagerE+0x10e)[0x51b47e]
[c5:03483] [ 3] /home/yacexi/software/SSAGES-public/build/ssages(_ZN6SSAGES4Hook19PostIntegrationHookEv+0xe9)[0x4c0839]
[c5:03483] [ 4] /home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS6Modify10post_forceEi+0x46)[0x7f196a2d3806]
[c5:03482] [ 5] /home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS6Modify10post_forceEi+0x46)[0x7f86c2ba7806]
[c5:03483] [ 5] /home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS6Verlet3runEi+0x3e2)[0x7f196a683262]
[c5:03482] [ 6] /home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS6Verlet3runEi+0x3e2)[0x7f86c2f57262]
[c5:03483] [ 6] /home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS3Run7commandEiPPc+0x33b)[0x7f196a6488ab]
[c5:03482] [ 7] /home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS3Run7commandEiPPc+0x33b)[0x7f86c2f1c8ab]
[c5:03483] [ 7] /home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS5Input15command_creatorINS_3RunEEEvPNS_6LAMMPSEiPPc+0x2b)[0x7f196a2ae23b]
[c5:03482] [ 8] /home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS5Input15command_creatorINS_3RunEEEvPNS_6LAMMPSEiPPc+0x2b)[0x7f86c2b8223b]
[c5:03483] [ 8] /home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS5Input15execute_commandEv+0x875)[0x7f196a2ac9c5]
[c5:03482] [ 9] /home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS5Input15execute_commandEv+0x875)[0x7f86c2b809c5]
[c5:03483] [ 9] /home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS5Input3oneEPKc+0x8c)[0x7f196a2ad34c]
[c5:03482] [10] /home/yacexi/software/SSAGES-public/build/ssages(_ZN6SSAGES6Driver3RunEv+0x42b)[0x4606cb]
[c5:03482] [11] /home/yacexi/software/SSAGES-public/build/ssages(main+0x3ca)[0x4416ea]
[c5:03482] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f19692bcec5]
[c5:03482] [13] /home/yacexi/software/SSAGES-public/build/ssages[0x44768c]
[c5:03482] *** End of error message ***
/home/yacexi/software/SSAGES-public/lammps-12Dec18/src/liblammps_mpi.so(_ZN9LAMMPS_NS5Input3oneEPKc+0x8c)[0x7f86c2b8134c]
[c5:03483] [10] /home/yacexi/software/SSAGES-public/build/ssages(_ZN6SSAGES6Driver3RunEv+0x42b)[0x4606cb]
[c5:03483] [11] /home/yacexi/software/SSAGES-public/build/ssages(main+0x3ca)[0x4416ea]
[c5:03483] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f86c1b90ec5]
[c5:03483] [13] /home/yacexi/software/SSAGES-public/build/ssages[0x44768c]
[c5:03483] *** End of error message ***
yacexi commented 5 years ago

I run the simulation with gdb. It shows following error messages:

Program received signal SIGSEGV, Segmentation fault.
0x000000000051c8c1 in SSAGES::DirectForwardFlux::InitializeQueue(SSAGES::Snapshot*, std::vector<SSAGES::CollectiveVariable*, std::allocator<SSAGES::CollectiveVariable*> > const&) ()
mquevill commented 5 years ago

If you were trying to run more than 1 processor per walker, we have recently identified an issue with the read/write that happens when "computeInitialFlux" is false. As a workaround for the current implementation, please limit simulations to 1 processor per walker. It should still proceed as expected with any number of walkers.

yacexi commented 5 years ago

Thank you for your reply. I run the simulation with 1 processor per walker and it failed. I compiled the SSAGES with debug mode. The error message shows:

 Program received signal SIGSEGV, Segmentation fault.
0x0000000000833ac6 in SSAGES::DirectForwardFlux::InitializeQueue (this=0xad2610, snapshot=0xad5410, 
    cvs=std::vector of length 1, capacity 1 = {...})
    at /home/yacexi/software/SSAGES-public/src/Methods/DirectForwardFlux.cpp:309
309             lprev = myconfig->l;

if I type p myconfig->l in gdb, it shows: Cannot access memory at address 0x194c2b0b0