PrincetonUniversity / athena-public-version

(MOVED) Athena++ GRMHD code and adaptive mesh refinement (AMR) framework. Active repository --->
https://github.com/PrincetonUniversity/athena
BSD 3-Clause "New" or "Revised" License
160 stars 118 forks source link

Segmentation fault appear after failure in launching mpi #53

Closed Laamkh closed 4 years ago

Laamkh commented 4 years ago

Sorry, this may be a stupid question to ask here. I don't know if someone here can help me or I should consult someone who are familiar with the system of my computer.

I was trying to run the Kelvin-Helmholtz problem with Open-MPI. The problem generator function in the kh.cpp file was modified. It was the first time for me to use Open-MPI, I guess I have done something wrongly so the segmentation fault appear even when I configure Athena++ without the -mpiflag but there was no such problems before the first trial of using MPI.

Below is what I have done and the output

(base) laam@Lees-MacBook-Air athena % a++config --prob kh --nscalar 1 -hdf5 -mpi
...
(base) laam@Lees-MacBook-Air 18 % a++ -i athinput.kh-shear-lecoanet -m 4

Root grid = 2 x 2 x 1 MeshBlocks
Total number of MeshBlocks = 4
Number of physical refinement levels = 0
Number of logical  refinement levels = 1
  Physical level = 0 (logical level = 1): 4 MeshBlocks, cost = 4
Number of parallel ranks = 4
  Rank = 0: 1 MeshBlocks, cost = 1
  Rank = 1: 1 MeshBlocks, cost = 1
  Rank = 2: 1 MeshBlocks, cost = 1
  Rank = 3: 1 MeshBlocks, cost = 1
Load Balancing:
  Minimum cost = 1, Maximum cost = 1, Average cost = 1

See the 'mesh_structure.dat' file for a complete list of MeshBlocks.
Use 'python ../vis/python/plot_mesh.py' or gnuplot to visualize mesh structure.
...
...
(base) laam@Lees-MacBook-Air 18 % mpirun -n 2 ~/athena/bin/athena\ -i\ ~/work/kh/18/athinput.kh-shear-lecoanet
--------------------------------------------------------------------------
mpirun was unable to launch the specified application as it could not access
or execute an executable:

Executable: /Users/laam/athena/bin/athena -i ~/work/kh/18/athinput.kh-shear-lecoanet
Node: Lees-MacBook-Air

while attempting to start process rank 0.
--------------------------------------------------------------------------
2 total processes failed to start

As I failed to run with MPI, I tried

(base) laam@Lees-MacBook-Air 18 % a++ -i athinput.kh-shear-lecoanet -m 1

Root grid = 2 x 2 x 1 MeshBlocks
Total number of MeshBlocks = 4
Number of physical refinement levels = 0
Number of logical  refinement levels = 1
  Physical level = 0 (logical level = 1): 4 MeshBlocks, cost = 4
Number of parallel ranks = 1
  Rank = 0: 4 MeshBlocks, cost = 4
Load Balancing:
  Minimum cost = 1, Maximum cost = 1, Average cost = 1

See the 'mesh_structure.dat' file for a complete list of MeshBlocks.
Use 'python ../vis/python/plot_mesh.py' or gnuplot to visualize mesh structure.

(base) laam@Lees-MacBook-Air 18 % a++ -i athinput.kh-shear-lecoanet     
[Lees-MacBook-Air:30732] *** Process received signal ***
[Lees-MacBook-Air:30732] Signal: Segmentation fault: 11 (11)
[Lees-MacBook-Air:30732] Signal code:  (0)
[Lees-MacBook-Air:30732] Failing at address: 0x0
[Lees-MacBook-Air:30732] [ 0] 0   libsystem_platform.dylib            0x00007fff6bf5f5fd _sigtramp + 29
[Lees-MacBook-Air:30732] [ 1] 0   athena                              0x0000000108c29478 _ZTVNSt3__118basic_stringstreamIcNS_11char_traitsIcEENS_9allocatorIcEEEE + 24
[Lees-MacBook-Air:30732] [ 2] 0   athena                              0x0000000108adc4cf _ZN14ParameterInput10GetIntegerENSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEES6_ + 271
[Lees-MacBook-Air:30732] [ 3] 0   athena                              0x0000000108be0891 _ZN12_GLOBAL__N_12sdEP9MeshBlocki + 161
[Lees-MacBook-Air:30732] [ 4] 0   athena                              0x0000000108bd3259 _ZN13HistoryOutput15WriteOutputFileEP4MeshP14ParameterInputb + 1289
[Lees-MacBook-Air:30732] [ 5] 0   athena                              0x0000000108bdadca _ZN7Outputs11MakeOutputsEP4MeshP14ParameterInputb + 522
[Lees-MacBook-Air:30732] [ 6] 0   athena                              0x0000000108ad735a main + 2154
[Lees-MacBook-Air:30732] [ 7] 0   libdyld.dylib                       0x00007fff6bd66cc9 start + 1
[Lees-MacBook-Air:30732] [ 8] 0   ???                                 0x0000000000000003 0x0 + 3
[Lees-MacBook-Air:30732] *** End of error message ***
zsh: segmentation fault  ~/athena/bin/athena -i athinput.kh-shear-lecoanet

Ever since, even when I configure the code without the -mpi flag, segfault appear. Is that I have altered the environment of my computer?

Thank you for your effort.

Regards, Laam

tomidakn commented 4 years ago

This may sound trivial but are you sure that the path to the executable ~/athena/bin/athena is correct, and this is properly configured with MPI? It looks like you used an executable in a directory named "18".

This may also sound trivial, but did you perform "make clean" after reconfiguring the code? The code can fail if object files from previous configuration exist.

kahoooo commented 4 years ago

It seems mpirun recognize the whole line include the arguments as the executable due to the backslashes. Maybe try getting rid of the backslashes?

Laamkh commented 4 years ago

This may sound trivial but are you sure that the path to the executable ~/athena/bin/athena is correct, and this is properly configured with MPI? It looks like you used an executable in a directory named "18".

This may also sound trivial, but did you perform "make clean" after reconfiguring the code? The code can fail if object files from previous configuration exist.

Sorry for my late reply. The compilation should be fine and I am sure I have performed make clean. I put the Athena executable in `/athena/bin/athena while the 'athinput.kh-shear-lecoanet' in ~/work/kh/18.

Now, I try to put them in the same directory and delete the backslashes. It works this time.

Thank you so much for your suggestion.