PrincetonUniversity / athena-public-version

(MOVED) Athena++ GRMHD code and adaptive mesh refinement (AMR) framework. Active repository --->
https://github.com/PrincetonUniversity/athena
BSD 3-Clause "New" or "Revised" License
160 stars 118 forks source link

SMR not working in 2D spherical for [0,pi] domains #29

Closed trwaters closed 5 years ago

trwaters commented 5 years ago

I'm getting a run-time seg-fault for both version 1.1.1 and 19 when using a theta-domain from 0 to pi in combination with SMR. I can reproduce using the default blast problem configured with python configure.py --prob=blast --coord=spherical_polar -mpi --flux=hllc

The default pgen works fine when I add the following refinement

refinement = static

<meshblock>
nx1         = 32
nx2         = 64
nx3         = 1

<refinement1>
x1min = 1.
x1max =  3.
x2min =  1.3089969389957472
x2max =  1.8325957145940461
x3min = -0.1
x3max =  0.1
level = 1

When changing the theta-grid to this, however, I get the seg-fault

nx2        = 64             # Number of zones in X2-direction
x2min      = 0.  # minimum value of X2
x2max      = 3.141592653589793  # maximum value of X2
ix2_bc     = polar_wedge        # inner-X2 boundary flag
ox2_bc     = polar_wedge        # outer-X2 boundary flag

Also seg-faults using 'reflecting' instead of 'polar_wedge'. This probably went undetected because 3D does work using [0,2pi] under periodic BCs in phi and polar BCs in theta.

This reminds me of a bug Zhaohuan fixed a couple years ago for me that was encountered without any refinement...I will try to find that email.

trwaters commented 5 years ago

So the bug Zhaohuan fixed was possibly unrelated since it was for MHD but the fix ended up being that the check if(pmb->block_size.nx3 > 1) for 2D was missing in one spot and so a 3D array tried to be allocated.

felker commented 5 years ago

Did you try running this without -mpi? In that case, I get a safe error and exit:

### FATAL ERROR in Mesh constructor
block_size must be larger than or equal to 4 cells.
trwaters commented 5 years ago

Sorry, I don't follow. Are you referring to nx3 being 1?

trwaters commented 5 years ago

And if I run without -mpi I'm still getting a seg-fault using either GNU or intel compilers.

felker commented 5 years ago

I followed your directions above exactly--- are you using inputs/hydro/athinput.blast_sph? mesh/nx3=96 by default in that file, so you also changed that to mesh/nx3=1? Can you post your exact problematic file?

trwaters commented 5 years ago

Thanks for looking into this. I made changes to make the problem 2D. Here's my input file. athinput.blast_sph.txt

felker commented 5 years ago

Backtrace (dies in initial Mesh construction, line numbers from public repo version master branch):

#0  0x00000000004ada7b in MeshBlockTree::FindMeshBlock (this=0x3123203339373938, tloc=...) at src/mesh/meshblock_tree.cpp:460
#1  0x00000000004adb5d in MeshBlockTree::FindMeshBlock (this=0x726b38, tloc=...) at src/mesh/meshblock_tree.cpp:469
#2  0x000000000041463d in BoundaryBase::SearchAndSetNeighbors (this=0x727700, tree=..., ranklist=0x727300, nslist=0x727320) at src/bvals/bvals_base.cpp:515
#3  0x000000000048d265 in Mesh::Mesh (this=0x726980, pin=0x725890, mesh_test=0) at src/mesh/mesh.cpp:515
#4  0x00000000004097d6 in main (argc=3, argv=0x7fffffffd448) at src/main.cpp:271

@tomidakn mind taking a look when you can?

felker commented 5 years ago

So I cannot reproduce the seg-fault if

ix2_bc     = reflecting  
ox2_bc     = reflecting

or outflow. Which leads me to believe that the SMR configuration with polar_wedge violates the unenforced constraint detailed https://github.com/PrincetonUniversity/athena-public-version/wiki/Boundary-Conditions

All MeshBlocks surrounding a given pole at the same radius must be at the same level of refinement.

trwaters commented 5 years ago

Thank you for pointing out that reflecting BCs do work - I had only tried those with the optimized configuration, python configure.py --prob=$prob --coord=spherical_polar --flux=hllc --cxx=icc --ccmd=mpiicpc which gives me the following error

srun -n 1 athena -i athinput.blast_sph

Setup complete, entering main loop...

cycle=0 time=0.0000000000000000e+00 dt=1.4352478961618750e-04
pure virtual method called
terminate called without an active exception
srun: error: gr0014: task 0: Aborted

If I just use the GNU compiler, all seems fine though, so this should get me going again.

felker commented 5 years ago

Which Intel compiler version and MPI library?

trwaters commented 5 years ago
bash-4.2$ module list

Currently Loaded Modules:
  1) intel/17.0.4   2) intel-mpi/5.1.3   3) hdf5-parallel/1.8.16
felker commented 5 years ago

I think this might actually be a bug in this particular version of the Intel compiler. I made a similar report in January 2019 for a completely different test / set of config options:

A recent set of changes to Athena++ caused our automated Intel-based regression tests to start failing on Perseus. In particular, we found that compiling with -O3 and ICC 17.0.5.239 caused our code to illegally call the wrong virtual function implementation (in a completely different derived class!) ... Tests with GCC, Clang, and local tests using ICC 19 on my MacBook were fine. Also, compiling at -O0 with ICC 17 prevents the issue. I have switched our testing environment on Persues to use the latest ICC 19 version, and I am waiting to see if the tests pass.

I was advised to simply upgrade to the latest Intel compiler. If you can reproduce this with Intel v19, please comment on this issue again.

trwaters commented 5 years ago

Correct, no such error with v19. Thanks again.