PrincetonUniversity / athena

Athena++ radiation GRMHD code and adaptive mesh refinement (AMR) framework
https://www.athena-astro.app
BSD 3-Clause "New" or "Revised" License
226 stars 122 forks source link

AMR in GR Does Not Work (Undefined Behavior) #475

Closed smressle closed 9 months ago

smressle commented 1 year ago

Prerequisite checklist

Place an X in between the brackets on each line as you complete these checks:

Summary of issue

Running the code with GR and AMR without magnetic fields produces artificial clumping behavior in density. Running it with MHD but B=0 everywhere initially fixes that clumping behavior in density but the field becomes nonzero in the same pattern. This seems to happen with MPI and AMR turned on in 3D.

e.g., this is bsq after some time (BH is in the center). AMR levels are concentrated around x=0, z=20rg, where x is horizontal and z is vertical.

Screen Shot 2023-01-10 at 6 57 45 PM

Steps to reproduce Run a simple bondi-type flow (uniform initial conditions with B=0) with a refinement condition that results in refinement of at least two levels (not sure if this is required). By the first time step nonzero fields arise.

You may want to include small snippets of Athena++ source code to describe the problem in more detail:

# Terminal output may be pasted here

Actual outcome

You may attach a plot here, if it is appropriate.

Additional comments and/or proposed solution I have tracked down the issue to these two functions (and the 4D versions as well)

namespace BufferUtility {
//----------------------------------------------------------------------------------------
//! \fn template <typename T> void PackData(const AthenaArray<T> &src, T *buf,
//!     int sn, int en, int si, int ei, int sj, int ej, int sk, int ek, int &offset)
//! \brief pack a 4D AthenaArray into a one-dimensional buffer

template <typename T> void PackData(const AthenaArray<T> &src, T *buf,
         int sn, int en, int si, int ei, int sj, int ej, int sk, int ek, int &offset) {
  for (int n=sn; n<=en; ++n) {
    for (int k=sk; k<=ek; k++) {
      for (int j=sj; j<=ej; j++) {
#pragma omp simd
        for (int i=si; i<=ei; i++)
          buf[offset++] = src(n,k,j,i);
      }
    }
  }
}

//----------------------------------------------------------------------------------------
//! \fn template <typename T> void PackData(const AthenaArray<T> &src, T *buf,
//!                     int si, int ei, int sj, int ej, int sk, int ek, int &offset)
//! \brief pack a 3D AthenaArray into a one-dimensional buffer

template <typename T> void PackData(const AthenaArray<T> &src, T *buf,
                                    int si, int ei, int sj, int ej, int sk, int ek,
                                    int &offset) {
  for (int k=sk; k<=ek; k++) {
    for (int j=sj; j<=ej; j++) {
#pragma omp simd
      for (int i=si; i<=ei; i++)
        buf[offset++] = src(k, j, i);
    }
  }
  return;
}

For some reason the integer offset is correctly calculated inside these functions but is occasionally returned as an incorrect value (one less than the correct value). As a result, the int "ssize" in the following section is incorrect and CopyVariableBufferSameProcess leaves off the last one or two elements of data. So some cells in the boundary are not set properly.

void BoundaryVariable::SendBoundaryBuffers() {
  MeshBlock *pmb = pmy_block_;
  int mylevel = pmb->loc.level;
  for (int n=0; n<pbval_->nneighbor; n++) {
    NeighborBlock& nb = pbval_->neighbor[n];
    if (bd_var_.sflag[nb.bufid] == BoundaryStatus::completed) continue;
    int ssize;
    if (nb.snb.level == mylevel)
      ssize = LoadBoundaryBufferSameLevel(bd_var_.send[nb.bufid], nb);
    else if (nb.snb.level<mylevel)
      ssize = LoadBoundaryBufferToCoarser(bd_var_.send[nb.bufid], nb);
    else
      ssize = LoadBoundaryBufferToFiner(bd_var_.send[nb.bufid], nb);
    if (nb.snb.rank == Globals::my_rank) {  // on the same process
      CopyVariableBufferSameProcess(nb, ssize);
    }
#ifdef MPI_PARALLEL
    else  // MPI
      MPI_Start(&(bd_var_.req_send[nb.bufid]));
#endif
    bd_var_.sflag[nb.bufid] = BoundaryStatus::completed;
  }
  return;
}

I think this is clearly a result of undefined behavior but I have not been able to track down the root cause. I can kluge it by replacing ssize with correct value calculated from loop limits but that is not very satisfying. The only thing I can think of is that the buffers are not allocated properly or something but I can't find evidence of that. I have no idea why this only becomes a problem with AMR and GR, etc. The weird behavior of the variable "offset" seems to occur robustly, so I don't know why it doesn't cause an issue in all configurations. This bug has been driving me crazy. Any ideas?

Confession: I have been using an old version of athena++ and that was what I have been using for debugging purposes, but I checked that this happens in the latest public version.

c-white commented 1 year ago

Those #pragma omp simd statements are suspiciously close to the problem. If you remove those, does anything change?

smressle commented 1 year ago

Dang it, yes! It seems to work for the first time step at least. I will see what happens when I run the simulation longer.

Do you know why simd it might be causing this?

c-white commented 1 year ago

I actually wouldn't have guessed that those loops were safe to force vectorize. Whether or not they were, forcing simd has proven to be a bit buggy on more than one occasion.

If those data packing routines were actually a performance bottleneck (which I doubt), it would be easy enough to replace the i-loops with memmove instructions, which would be unbeatable in terms of performance.

tomidakn commented 1 year ago

I agree with @c-white that this pragma simd is not crucial for performance. Modern compilers should be smart enough to do it right.

However, this pragma has been there almost from the beginning of the history of Athena++. @smressle , can you tell us what is your compiler including its version and OS?

smressle commented 1 year ago

I can confirm that removing the simd commands seems to have fixed the issue. Thanks! Not sure yet if performance has been affected very much or not but I didn't notice a big change.

I am using

icpc (ICC) 19.1.3.304 20200925

on Operating System: CentOS Linux 8 (Core) CPE OS Name: cpe:/o:centos:centos:8 Kernel: Linux 4.18.0-193.28.1.el8_2.x86_64 Architecture: x86-64

I have been using this to run simulations for a while and this is the first time I have had any obvious error like this arise (and it is also the first time I have tried using AMR). I am very confused about why it only causes a problem for the GR + AMR combination (it seemed also to affect MHD + AMR runs slightly for me), but that could explain why it hasn't made itself known before.

c-white commented 9 months ago

As the offending simd pragmas were removed when radiation was merged last year, I'm marking this as completed.