AMReX-Codes / amrex

AMReX: Software Framework for Block Structured AMR
https://amrex-codes.github.io/amrex
Other
543 stars 346 forks source link

fillNeighbors sometimes hangs after regrid #487

Closed rhouim closed 5 years ago

rhouim commented 5 years ago

Hi,

We are running into a problem where fillNeighbors sometimes hangs after regrid.

Background: We are working on adding Lagrangian particles and are switching to the NeighborParticleContainer from the AmrParticleContainer so that we have particles in the ghost cells as well as particles owned by higher levels of refinement.

Our code is based on amr_core and the initializer for the particle class is:

HyBurnParticleContainer(AmrCore* amr_core, int nghost=2)
        : amrex::NeighborParticleContainer<realData::n_aos, intData::n_iaos> (amr_core->GetParGDB(), nghost)

We verified that solutions between the initial AmrParticleContainer and NeighborParticleContainer (without using any of the neighbor particle functions) were identical and correct before we added neighbor particles to the routines.

Problem: When we added fillNeighbor commands, the code started hanging.

To hang the code we added at the end of our regrid function:

pc->Redistribute();  // This is always here and distributes particles properly with the changing grid layout.  
pc->fillNeighbors(); // Code hangs in this command, but usually occurs after the grid has changed a few times. 
                    // (e.g., after 21 time steps and regridding every 3 steps with grid changing multiple times prior to the hanging.) 
pc->clearNeighbors();

where

pc = std::unique_ptr<HyBurnParticleContainer>(new HyBurnParticleContainer(this));

We did a little more poking around and found that the hanging occurs in fillNeighborsMPI in AMReX_NeighborParticlesCPUImpl.H sometime in the send command loop. (Not all processors were making it past that point.)

The exact occurance of the hanging is dependent on the number of MPI ranks, amr settings, etc. This behavior occurs for the latest AMReX development branch and with Intel and gnu compilers with or without DEBUG turned on. (At least the compiler options do not affect when the hanging occurs.) We haven't tried any other compilers. This occurs with openMPI 4.0.0 and openMPI 3.1.0.

Thanks for your help!

Ryan

atmyers commented 5 years ago

Thank you for reporting this - I'll try to enable neighbor particles on the Advection Tutorial and see if I can reproduce this behavior.

rhouim commented 5 years ago

Thanks for looking into this.

We played with this further and found a suitable workaround by using a different initializer for the neighborparticles. Using the NeighborParticleClass initializer where vectors of the geometry, distribution map, etc. are explicitly given:

    HyBurnParticleContainer(const Vector<Geometry>              & geom,
                            const Vector<DistributionMapping>   & dmap,
                            const Vector<BoxArray>              & grids,
                            const Vector<int>                   & refRatio,
                            int                                 nneighbor)
    : amrex::NeighborParticleContainer<realData::n_aos, intData::n_iaos>(geom, dmap, grids, refRatio, nneighbor)

behaves as expected and doesn't hang when filling neighbor particles as long as

pc->Regrid(dmap, grids);

is used to regrid and distribute the particles along with the CFD grid at the end of the main Regrid function in our code.

Note that fillNeighbors still hangs if the previous initializer in my earlier comment is used even with the above pc->Regrid function to explicitly redistribute the particle grid.