SW4 (Seismic Waves, 4th order) implements substantial capabilities for 3-D seismic modeling, with a free surface condition on the top boundary, absorbing super-grid conditions on the far-field boundaries, and an arbitrary number of point force and/or point moment tensor source terms.
But when I try to run ./test_sw4.py -u 0 -d debug_mp/ -v, it fails at state 10 as follows
Test # 9 Input file: tw-att-2.in PASSED
Starting test # 10 in directory: attenuation with input file: tw-topo-att-1.in
Running sw4 from directory: /home/batkillerz/sw4_base/src/sw4-3.0/pytest/attenuation
run_cmd= ['mpirun', '-np', '1', '/home/batkillerz/sw4_base/src/sw4-3.0/debug_mp//sw4', '/home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in']
ERROR: Test tw-topo-att-1.in : sw4 returned non-zero exit status= 1 aborting test
run_cmd= ['mpirun', '-np', '1', '/home/batkillerz/sw4_base/src/sw4-3.0/debug_mp//sw4', '/home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in']
DID YOU USE THE CORRECT SW4 EXECUTABLE? (SPECIFY DIRECTORY WITH -d OPTION)
test_sw4 was unsuccessful
when I try to run the last command,"mpirun -np 4 /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4 /home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in", the following error came out
sw4 version 3.0
This program comes with ABSOLUTELY NO WARRANTY; released under GPL.
This is free software, and you are welcome to redistribute
it under certain conditions, see LICENSE.txt for more details
Compiled on: Fri Apr 19 04:49:41 PM +08 2024
By user: batkillerz
Machine: homelab
Compiler: /storage/software/openmpi/5.0.1/bin/mpicxx
3rd party include dir: /include, and library dir: /lib
Input file: /home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in
Default Supergrid thickness has been tuned; # grid points = 1 grid sizes
Default Supergrid damping coefficient has been tuned; damping coefficient = 0.00000000e+00
Processing the grid command...
Setting h to 1.25600000e-01 from x/(nx-1) (x=6.28000000e+00, nx=51)
Setting ny to 51 to be consistent with h=1.25600000e-01
Setting nz to 51 to be consistent with h=1.25600000e-01
cleanupRefinementLevels: topo_zmax = 3.00000000e+00
Cartesian refinement levels (z=):
3.00000000e+00
Curvilinear refinement levels (z=):
0.00000000e+00
Grid distributed on 4 processors
Finest grid size 55 x 55
Processor array 2 x 2
Number of curvilinear grids = 1
Number of Cartesian grids = 1
Total number of grids = 2
Extent of the computational domain xmax=6.28000000e+00 ymax=6.28000000e+00 zmax=6.26560000e+00
Cartesian refinement levels after correction:
Grid=0 z-min=3.00000000e+00
Corrected global_zmax = 6.26560000e+00
***Topography grid: min z = -5.967331e-01, max z = -2.932192e-58, top Cartesian z = 3.000000e+00
Global grid sizes (without ghost points)
Grid h Nx Ny Nz Points Type
0 0.1256 51 51 27 70227 Cartesian
1 0.1256 51 51 27 70227 Curvilinear
Total number of grid points (without ghost points): 140454
Default Supergrid damping coefficient has been tuned; damping coefficient = 0.00000000e+00
Default Supergrid thickness has been tuned; # grid points = 1 grid sizes
Execution time, reading input file 6.77432830e-02 seconds
Assuming a SERIAL file system.
Detected at least one boundary with supergrid conditions
Making Directory: tw-topo-att-1/
... Done!
Geographic and Cartesian coordinates of the corners of the computational grid:
0: Lon= -1.180000e+02, Lat=3.700000e+01, x=0.000000e+00, y=0.000000e+00
1: Lon= -1.180000e+02, Lat=3.700006e+01, x=6.280000e+00, y=0.000000e+00
2: Lon= -1.179999e+02, Lat=3.700006e+01, x=6.280000e+00, y=6.280000e+00
3: Lon= -1.179999e+02, Lat=3.700000e+01, x=0.000000e+00, y=6.280000e+00
ASSIGNING TWILIGHT MATERIALS
----------- Material properties ranges ---------------
1.00118341e+00 kg/m^3 <= Density <= 2.99885859e+00 kg/m^3
1.63353903e+00 m/s <= Vp <= 2.82632270e+00 m/s
1.00033876e+00 m/s <= Vs <= 1.73075388e+00 m/s
1.52767088e+00 <= Vp/Vs <= 1.73199227e+00
2.00118341e+00 Pa <= mu <= 3.99885859e+00 Pa
1.00157479e+00 Pa <= lambda <= 2.99848185e+00 Pa
------------------------------------------------------
*** Attenuation parameters calculated for 1 mechanisms,
max freq=2.000000e+00 [Hz], min_freq=2.000000e-02 [Hz], velo_freq=1.000000e+00 [Hz]
Assigned material properties
computing the time step
[homelab:576512] Process received signal
[homelab:576512] Signal: Segmentation fault (11)
[homelab:576512] Signal code: Address not mapped (1)
[homelab:576512] Failing at address: 0x7ffc1189c000
[homelab:576511] Process received signal
[homelab:576511] Signal: Segmentation fault (11)
[homelab:576511] Signal code: Address not mapped (1)
[homelab:576511] Failing at address: 0x7ffedd464000
Message from routine DSPEV in library SLATEC.
Potentially recoverable error, Prog aborted, Traceback requested
On entry to DSPEV parameter number 3 had an illegal value.
Error number = 3
***End of message
***Job abort due to unrecovered error.
Message from routine DSPEV in library SLATEC.
Potentially recoverable error, Prog aborted, Traceback requested
On entry to DSPEV parameter number 3 had an illegal value.
Message from routine DSPEV in library SLATEC.
Potentially recoverable error, Prog aborted, Traceback requested
On entry to DSPEV parameter number 3 had an illegal value.
Error number = 3
***End of message
Message from routine DSPEV in library SLATEC.
Potentially recoverable error, Prog aborted, Traceback requested
On entry to DSPEV parameter number 3 had an illegal value.
Error number = 3
***End of message
***Job abort due to unrecovered error.
Error message summary
Library Subroutine Message start NERR Level Count
SLATEC DSPEV On entry to DSPEV p 3 1 1
Message from routine DSPEV in library SLATEC.
Potentially recoverable error, Prog aborted, Traceback requested
On entry to DSPEV parameter number 3 had an illegal value.
Error number = 3
***End of message
***Job abort due to unrecovered error.
Error message summary
Library Subroutine Message start NERR Level Count
SLATEC DSPEV On entry to DSPEV p 3 1 1
Error message summary
Library Subroutine Message start NERR Level Count
SLATEC DSPEV On entry to DSPEV p 3 1 1
***Job abort due to unrecovered error.
Error number = 3
***End of message
Error message summary
Library Subroutine Message start NERR Level Count
SLATEC DSPEV On entry to DSPEV p 3 1 1
prterun has exited due to process rank 3 with PID 576513 on node homelab exiting
improperly. There are three reasons this could occur:
this process did not call "init" before exiting, but others in the
job did. This can cause a job to hang indefinitely while it waits for
all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.
this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"
this process called "MPI_Abort" or "prte_abort" and the mca
parameter prte_create_session_dirs is set to false. In this case, the
run-time cannot detect that the abort call was an abnormal
termination. Hence, the only error message you will receive is this
one.
This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
You can avoid this message by specifying -quiet on the prterun command
line.
Hi
I managed to install sw4 using make successfully
'-.,_,.-'
'-.,,.='``'-.,,.-''-.,_,.='````'-.,_,.-'
'-.,_,.='``/ \ \ \ / \ / / / | | | | | ./ \ \ / \ / / / | | | | |__ \ \/ \/ / / '--' | __ \ \ / |__ | | | \ /\ / | | /`__| | \ / \ / | | ___/ _/ _/ |__|
| | | | \ \ / / | __| / | | | | | | | \ \/ / | | | ( | | | | | | \ / | | _ | | | | `----.| | \ / | |__ ) | || |__||| _/ |____| (___/ ()
'-.,_,.-'
'-.,,.='``'-.,,.-''-.,_,.='````'-.,_,.-'
'-.,_,.='``But when I try to run ./test_sw4.py -u 0 -d debug_mp/ -v, it fails at state 10 as follows
Test # 9 Input file: tw-att-2.in PASSED Starting test # 10 in directory: attenuation with input file: tw-topo-att-1.in Running sw4 from directory: /home/batkillerz/sw4_base/src/sw4-3.0/pytest/attenuation run_cmd= ['mpirun', '-np', '1', '/home/batkillerz/sw4_base/src/sw4-3.0/debug_mp//sw4', '/home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in'] ERROR: Test tw-topo-att-1.in : sw4 returned non-zero exit status= 1 aborting test run_cmd= ['mpirun', '-np', '1', '/home/batkillerz/sw4_base/src/sw4-3.0/debug_mp//sw4', '/home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in'] DID YOU USE THE CORRECT SW4 EXECUTABLE? (SPECIFY DIRECTORY WITH -d OPTION) test_sw4 was unsuccessful
when I try to run the last command,"mpirun -np 4 /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4 /home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in", the following error came out
This program comes with ABSOLUTELY NO WARRANTY; released under GPL. This is free software, and you are welcome to redistribute it under certain conditions, see LICENSE.txt for more details
Compiled on: Fri Apr 19 04:49:41 PM +08 2024 By user: batkillerz Machine: homelab Compiler: /storage/software/openmpi/5.0.1/bin/mpicxx 3rd party include dir: /include, and library dir: /lib
Input file: /home/batkillerz/sw4_base/src/sw4-3.0/pytest/reference/attenuation/tw-topo-att-1.in Default Supergrid thickness has been tuned; # grid points = 1 grid sizes Default Supergrid damping coefficient has been tuned; damping coefficient = 0.00000000e+00
Rank=0, Grid #1 (curvilinear), iInterior=[1,26], jInterior=[1,26] Rank=0, Grid #0 (Cartesian), iInterior=[1,26], jInterior=[1,26], kInterior=[1,27] inside allocateCurvilinearArrays
***Topography grid: min z = -5.967331e-01, max z = -2.932192e-58, top Cartesian z = 3.000000e+00
Global grid sizes (without ghost points) Grid h Nx Ny Nz Points Type 0 0.1256 51 51 27 70227 Cartesian 1 0.1256 51 51 27 70227 Curvilinear Total number of grid points (without ghost points): 140454
Default Supergrid damping coefficient has been tuned; damping coefficient = 0.00000000e+00 Default Supergrid thickness has been tuned; # grid points = 1 grid sizes
Execution time, reading input file 6.77432830e-02 seconds Assuming a SERIAL file system. Detected at least one boundary with supergrid conditions
Making Directory: tw-topo-att-1/
... Done!
Geographic and Cartesian coordinates of the corners of the computational grid: 0: Lon= -1.180000e+02, Lat=3.700000e+01, x=0.000000e+00, y=0.000000e+00 1: Lon= -1.180000e+02, Lat=3.700006e+01, x=6.280000e+00, y=0.000000e+00 2: Lon= -1.179999e+02, Lat=3.700006e+01, x=6.280000e+00, y=6.280000e+00 3: Lon= -1.179999e+02, Lat=3.700000e+01, x=0.000000e+00, y=6.280000e+00
ASSIGNING TWILIGHT MATERIALS
* PPW = minVs/h/maxFrequency **** g=0, h=1.256000e-01, minVs/h=7.96448 (Cartesian) g=1, h=1.256000e-01, minVs/h=7.96466 (curvilinear)
*** Attenuation parameters calculated for 1 mechanisms, max freq=2.000000e+00 [Hz], min_freq=2.000000e-02 [Hz], velo_freq=1.000000e+00 [Hz]
Assigned material properties computing the time step [homelab:576512] Process received signal [homelab:576512] Signal: Segmentation fault (11) [homelab:576512] Signal code: Address not mapped (1) [homelab:576512] Failing at address: 0x7ffc1189c000 [homelab:576511] Process received signal [homelab:576511] Signal: Segmentation fault (11) [homelab:576511] Signal code: Address not mapped (1) [homelab:576511] Failing at address: 0x7ffedd464000 Message from routine DSPEV in library SLATEC. Potentially recoverable error, Prog aborted, Traceback requested
On entry to DSPEV parameter number 3 had an illegal value.
Error number = 3
***End of message
***Job abort due to unrecovered error.
Message from routine DSPEV in library SLATEC. Potentially recoverable error, Prog aborted, Traceback requested
On entry to DSPEV parameter number 3 had an illegal value. Message from routine DSPEV in library SLATEC. Potentially recoverable error, Prog aborted, Traceback requested
On entry to DSPEV parameter number 3 had an illegal value.
Error number = 3
***End of message
Message from routine DSPEV in library SLATEC. Potentially recoverable error, Prog aborted, Traceback requested
On entry to DSPEV parameter number 3 had an illegal value.
Error number = 3
***End of message
***Job abort due to unrecovered error.
Library Subroutine Message start NERR Level Count SLATEC DSPEV On entry to DSPEV p 3 1 1
Message from routine DSPEV in library SLATEC. Potentially recoverable error, Prog aborted, Traceback requested
On entry to DSPEV parameter number 3 had an illegal value.
Error number = 3
***End of message
***Job abort due to unrecovered error.
Library Subroutine Message start NERR Level Count SLATEC DSPEV On entry to DSPEV p 3 1 1
Library Subroutine Message start NERR Level Count SLATEC DSPEV On entry to DSPEV p 3 1 1
***Job abort due to unrecovered error.
Error number = 3
***End of message
Library Subroutine Message start NERR Level Count SLATEC DSPEV On entry to DSPEV p 3 1 1
***Job abort due to unrecovered error.
Library Subroutine Message start NERR Level Count SLATEC DSPEV On entry to DSPEV p 3 1 1 [homelab:576512] [ 0] /lib64/libc.so.6(+0x54db0)[0x7f108f454db0] [homelab:576512] [ 1] /lib64/liblapack.so.3(dlansp+0x2d5)[0x7f10905c1ab5] [homelab:576512] [ 2] [homelab:576511] [ 0] /lib64/libc.so.6(+0x54db0)[0x7f99d5454db0] [homelab:576511] [ 1] /lib64/liblapack.so.3(dlansp+0x2d5)[0x7f99d65c1ab5] [homelab:576511] [ 2] /lib64/liblapack.so.3(dspev_+0x15b)[0x7f99d661140b] [homelab:576511] [ 3] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4f627c] [homelab:576511] [ 4] /lib64/libgomp.so.1(GOMP_parallel+0x46)[0x7f99d62f2576] [homelab:576511] [ 5] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4ee103] [homelab:576511] [ 6] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4e89d6] [homelab:576511] [ 7] /home/batkillerz/sw4_base/src/sw4-3.0/debugmp/sw4[0x407532] /lib64/liblapack.so.3(dspev+0x15b)[0x7f109061140b] [homelab:576512] [ 3] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4f627c] [homelab:576512] [ 4] /lib64/libgomp.so.1(GOMP_parallel+0x46)[0x7f1090b72576] [homelab:576512] [ 5] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4ee103] [homelab:576512] [ 6] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x4e89d6] [homelab:576512] [ 7] /home/batkillerz/sw4_base/src/sw4-3.0/debug_mp/sw4[0x407532]
[homelab:576512] [homelab:576511] [ 8] /lib64/libc.so.6(+0x3feb0)[0x7f99d543feb0]
prterun has exited due to process rank 3 with PID 576513 on node homelab exiting improperly. There are three reasons this could occur:
this process did not call "init" before exiting, but others in the job did. This can cause a job to hang indefinitely while it waits for all processes to call "init". By rule, if one process calls "init", then ALL processes must call "init" prior to termination.
this process called "init", but exited without calling "finalize". By rule, all processes that call "init" MUST call "finalize" prior to exiting or it will be considered an "abnormal termination"
this process called "MPI_Abort" or "prte_abort" and the mca parameter prte_create_session_dirs is set to false. In this case, the run-time cannot detect that the abort call was an abnormal termination. Hence, the only error message you will receive is this one.
This may have caused other processes in the application to be terminated by signals sent by prterun (as reported here).
You can avoid this message by specifying -quiet on the prterun command line.
Any idea how I can fix this? Thanks!