SPECFEM / specfem3d

SPECFEM3D_Cartesian simulates acoustic (fluid), elastic (solid), coupled acoustic/elastic, poroelastic or seismic wave propagation in any type of conforming mesh of hexahedra (structured or not).
GNU General Public License v3.0
409 stars 227 forks source link

openmpi3 conflicts #1337

Open rmodrak opened 5 years ago

rmodrak commented 5 years ago

Recent devel branch versions of SPECFEM3D work as expected when openmpi2 is used in compilation but fail at the xgenerate_databases stage when openmpi3 is used.

Has anyone else experienced something similar?

In the comments below, I will try to collect error messages from different clusters.

rmodrak commented 5 years ago

CHINOOK.ALASKA.EDU

Currently Loaded Modulefiles:
  1) /home/ctape/intel-modules/intel-2016
  2) openmpi/intel/3.0.2
  3) slurm

EXAMPLES/homogeneous_halfspace

running example: Thu May 30 17:28:42 AKDT 2019

   setting up example...

  decomposing mesh...

 **********************
 Serial mesh decomposer
 **********************

 total number of nodes: 
   nnodes =        23273
 total number of spectral elements:
   nspec =        20736
 materials:
   num_mat =            1
   defined =            1 undefined =            0
   no poroelastic material file found
 absorbing boundaries:
   nspec2D_xmin =          576
   nspec2D_xmax =          576
   nspec2D_ymin =          576
   nspec2D_ymax =          576
   nspec2D_bottom =         1296
   nspec2D_top =         1296
   no absorbing_cpml_file file found
   no moho_surface_file file found
 Par_file_faults not found: assuming that there are no faults
 node valence:  min =            1  max =            8
   nsize =            8 sup_neighbor =           38
 mesh2dual:
   max_neighbor =           26
 partitions: 
   num =            4

 Databases files in directory: ./OUTPUT_FILES/DATABASES_MPI
 finished successfully

  running database generation on  4 processors...

--------------------------------------------------------------------------
As of version 3.0.0, the "sm" BTL is no longer available in Open MPI.

Efficient, high-speed same-node shared memory communication support in
Open MPI is available in the "vader" BTL.  To use the vader BTL, you
can re-run your job with:

    mpirun --mca btl vader,self,... your_mpi_application
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      n0
Framework: btl
Component: sm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  mca_bml_base_open() failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
[n0:55158] *** An error occurred in MPI_Init
[n0:55158] *** reported by process [4230283265,0]
[n0:55158] *** on a NULL communicator
[n0:55158] *** Unknown error
[n0:55158] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[n0:55158] ***    and potentially your MPI job)
[n0:55141] 1 more process has sent help message help-mpi-btl-sm.txt / btl sm is dead
[n0:55141] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[n0:55141] 1 more process has sent help message help-mca-base.txt / find-available:not-valid
rmodrak commented 5 years ago

TIGERCPU.PRINCETON.EDU

Currently Loaded Modulefiles:
  1) openmpi/gcc/3.0.0/64

EXAMPLES/homogeneous_halfspace

running example: Thu May 30 20:59:02 EDT 2019

   setting up example...

  decomposing mesh...

 **********************
 Serial mesh decomposer
 **********************

 total number of nodes:
   nnodes =        23273
 total number of spectral elements:
   nspec =        20736
 materials:
   num_mat =            1
   defined =            1 undefined =            0
   no poroelastic material file found
 absorbing boundaries:
   nspec2D_xmin =          576
   nspec2D_xmax =          576
   nspec2D_ymin =          576
   nspec2D_ymax =          576
   nspec2D_bottom =         1296
   nspec2D_top =         1296
   no absorbing_cpml_file file found
   no moho_surface_file file found
 Par_file_faults not found: assuming that there are no faults
 node valence:  min =            1  max =            8
   nsize =            8 sup_neighbor =           38
 mesh2dual:
   max_neighbor =           26
 partitions:
   num =            4

 Databases files in directory: OUTPUT_FILES/DATABASES_MPI
 finished successfully

  running database generation on  4 processors...

[tiger-i26c2n10:24196] *** Process received signal ***
[tiger-i26c2n10:24196] Signal: Segmentation fault (11)
[tiger-i26c2n10:24196] Signal code: Address not mapped (1)
[tiger-i26c2n10:24196] Failing at address: 0x30
./run_this_example.sh: line 56: 24196 Segmentation fault      mpirun -np $NPROC ./bin/xgenerate_databases