Open rmodrak opened 5 years ago
CHINOOK.ALASKA.EDU
Currently Loaded Modulefiles:
1) /home/ctape/intel-modules/intel-2016
2) openmpi/intel/3.0.2
3) slurm
EXAMPLES/homogeneous_halfspace
running example: Thu May 30 17:28:42 AKDT 2019
setting up example...
decomposing mesh...
**********************
Serial mesh decomposer
**********************
total number of nodes:
nnodes = 23273
total number of spectral elements:
nspec = 20736
materials:
num_mat = 1
defined = 1 undefined = 0
no poroelastic material file found
absorbing boundaries:
nspec2D_xmin = 576
nspec2D_xmax = 576
nspec2D_ymin = 576
nspec2D_ymax = 576
nspec2D_bottom = 1296
nspec2D_top = 1296
no absorbing_cpml_file file found
no moho_surface_file file found
Par_file_faults not found: assuming that there are no faults
node valence: min = 1 max = 8
nsize = 8 sup_neighbor = 38
mesh2dual:
max_neighbor = 26
partitions:
num = 4
Databases files in directory: ./OUTPUT_FILES/DATABASES_MPI
finished successfully
running database generation on 4 processors...
--------------------------------------------------------------------------
As of version 3.0.0, the "sm" BTL is no longer available in Open MPI.
Efficient, high-speed same-node shared memory communication support in
Open MPI is available in the "vader" BTL. To use the vader BTL, you
can re-run your job with:
mpirun --mca btl vader,self,... your_mpi_application
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.
Host: n0
Framework: btl
Component: sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
mca_bml_base_open() failed
--> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[n0:55158] *** An error occurred in MPI_Init
[n0:55158] *** reported by process [4230283265,0]
[n0:55158] *** on a NULL communicator
[n0:55158] *** Unknown error
[n0:55158] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[n0:55158] *** and potentially your MPI job)
[n0:55141] 1 more process has sent help message help-mpi-btl-sm.txt / btl sm is dead
[n0:55141] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[n0:55141] 1 more process has sent help message help-mca-base.txt / find-available:not-valid
TIGERCPU.PRINCETON.EDU
Currently Loaded Modulefiles:
1) openmpi/gcc/3.0.0/64
EXAMPLES/homogeneous_halfspace
running example: Thu May 30 20:59:02 EDT 2019
setting up example...
decomposing mesh...
**********************
Serial mesh decomposer
**********************
total number of nodes:
nnodes = 23273
total number of spectral elements:
nspec = 20736
materials:
num_mat = 1
defined = 1 undefined = 0
no poroelastic material file found
absorbing boundaries:
nspec2D_xmin = 576
nspec2D_xmax = 576
nspec2D_ymin = 576
nspec2D_ymax = 576
nspec2D_bottom = 1296
nspec2D_top = 1296
no absorbing_cpml_file file found
no moho_surface_file file found
Par_file_faults not found: assuming that there are no faults
node valence: min = 1 max = 8
nsize = 8 sup_neighbor = 38
mesh2dual:
max_neighbor = 26
partitions:
num = 4
Databases files in directory: OUTPUT_FILES/DATABASES_MPI
finished successfully
running database generation on 4 processors...
[tiger-i26c2n10:24196] *** Process received signal ***
[tiger-i26c2n10:24196] Signal: Segmentation fault (11)
[tiger-i26c2n10:24196] Signal code: Address not mapped (1)
[tiger-i26c2n10:24196] Failing at address: 0x30
./run_this_example.sh: line 56: 24196 Segmentation fault mpirun -np $NPROC ./bin/xgenerate_databases
Recent devel branch versions of SPECFEM3D work as expected when openmpi2 is used in compilation but fail at the
xgenerate_databases
stage when openmpi3 is used.Has anyone else experienced something similar?
In the comments below, I will try to collect error messages from different clusters.