ECP-WarpX / WarpX

WarpX is an advanced electromagnetic & electrostatic Particle-In-Cell code.
https://ecp-warpx.github.io
Other
295 stars 190 forks source link

Invalid memory access when moving window and timers-based load-balancing is used #3377

Open lucafedeli88 opened 2 years ago

lucafedeli88 commented 2 years ago

I am opening this issue because I have observed an invalid memory access when moving window and load-balancing based on timers are used in combination.

Here I provide a small reproducer:

#################################
####### GENERAL PARAMETERS ######
#################################
max_step = 10
amr.n_cell =  64 64 64
amr.max_grid_size = 32
amr.blocking_factor = 32
amr.max_level = 0
geometry.dims = 3
geometry.prob_lo = -10.e-6   -10.e-6   -10.e-6    # physical domain
geometry.prob_hi =  10.e-6    10.e-6    10.e-6

algo.load_balance_intervals = 3::100
algo.load_balance_with_sfc = 0
algo.load_balance_costs_update = timers

warpx.do_moving_window = 1
warpx.moving_window_dir = z
warpx.moving_window_v = 1.0
warpx.start_moving_window_step = 2

#################################
####### Boundary condition ######
#################################
boundary.field_lo = pml pml pml
boundary.field_hi = pml pml pml

#################################
############ NUMERICS ###########
#################################
warpx.verbose = 1
warpx.cfl = 0.99

# Order of particle shape factors
algo.particle_shape = 3

#################################
############ PLASMA #############
#################################
particles.species_names = electrons

electrons.species_type = electron
electrons.injection_style = "NUniformPerCell"
electrons.num_particles_per_cell_each_dim = 1 1 2
electrons.profile = constant
electrons.density = 1.e25  # number of electrons per m^3
electrons.momentum_distribution_type = "gaussian"
electrons.ux_th  = 0.01 # uth the std of the (unitless) momentum
electrons.uy_th  = 0.01 # uth the std of the (unitless) momentum
electrons.uz_th  = 0.01 # uth the std of the (unitless) momentum

When WarpX runs this inputfile (even without GPUs or OMP support), valgrind detects the following issue:

STEP 3 starts ...
==41155== Invalid read of size 4
==41155==    at 0x55CFBD: Add<float> (AMReX_GpuAtomic.H:584)
==41155==    by 0x55CFBD: WarpX::shiftMF(amrex::MultiFab&, amrex::Geometry const&, int, int, int, float, bool, amrex::ParserExecutor<3> const&) (WarpXMovingWindow.cpp:435)
==41155==    by 0x55F8EF: WarpX::MoveWindow(int, bool) (WarpXMovingWindow.cpp:192)
==41155==    by 0x372D78: WarpX::Evolve(int) (WarpXEvolve.cpp:269)
==41155==    by 0x1BB863: main (main.cpp:67)
==41155==  Address 0xb8c925c is 4 bytes before a block of size 32 alloc'd
==41155==    at 0x4840F2F: operator new(unsigned long) (vg_replace_malloc.c:422)
==41155==    by 0x1F088A: allocate (new_allocator.h:127)
==41155==    by 0x1F088A: allocate (alloc_traits.h:464)
==41155==    by 0x1F088A: _M_allocate (stl_vector.h:346)
==41155==    by 0x1F088A: std::vector<float, std::allocator<float> >::_M_default_append(unsigned long) (vector.tcc:635)
==41155==    by 0x1D8EEB: define (AMReX_LayoutData.H:31)
==41155==    by 0x1D8EEB: LayoutData (AMReX_LayoutData.H:22)
==41155==    by 0x1D8EEB: make_unique<amrex::LayoutData<float>, const amrex::BoxArray&, const amrex::DistributionMapping&> (unique_ptr.h:962)
==41155==    by 0x1D8EEB: WarpX::AllocLevelMFs(int, amrex::BoxArray const&, amrex::DistributionMapping const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, bool) (WarpX.cpp:2170)
==41155==    by 0x1DCEAB: WarpX::AllocLevelData(int, amrex::BoxArray const&, amrex::DistributionMapping const&) (WarpX.cpp:1680)
==41155==    by 0x1DCFC7: WarpX::MakeNewLevelFromScratch(int, float, amrex::BoxArray const&, amrex::DistributionMapping const&) (WarpX.cpp:1548)
==41155==    by 0x6D620D: amrex::AmrMesh::MakeNewGrids(float) (AMReX_AmrMesh.cpp:779)
==41155==    by 0x3DDC2F: InitFromScratch (WarpXInitData.cpp:472)
==41155==    by 0x3DDC2F: WarpX::InitData() (WarpXInitData.cpp:378)
==41155==    by 0x1BB856: main (main.cpp:65)
==41155== 
==41155== Invalid write of size 4
==41155==    at 0x55CFC1: Add<float> (AMReX_GpuAtomic.H:584)
==41155==    by 0x55CFC1: WarpX::shiftMF(amrex::MultiFab&, amrex::Geometry const&, int, int, int, float, bool, amrex::ParserExecutor<3> const&) (WarpXMovingWindow.cpp:435)
==41155==    by 0x55F8EF: WarpX::MoveWindow(int, bool) (WarpXMovingWindow.cpp:192)
==41155==    by 0x372D78: WarpX::Evolve(int) (WarpXEvolve.cpp:269)
==41155==    by 0x1BB863: main (main.cpp:67)
==41155==  Address 0xb8c925c is 4 bytes before a block of size 32 alloc'd
==41155==    at 0x4840F2F: operator new(unsigned long) (vg_replace_malloc.c:422)
==41155==    by 0x1F088A: allocate (new_allocator.h:127)
==41155==    by 0x1F088A: allocate (alloc_traits.h:464)
==41155==    by 0x1F088A: _M_allocate (stl_vector.h:346)
==41155==    by 0x1F088A: std::vector<float, std::allocator<float> >::_M_default_append(unsigned long) (vector.tcc:635)
==41155==    by 0x1D8EEB: define (AMReX_LayoutData.H:31)
==41155==    by 0x1D8EEB: LayoutData (AMReX_LayoutData.H:22)
==41155==    by 0x1D8EEB: make_unique<amrex::LayoutData<float>, const amrex::BoxArray&, const amrex::DistributionMapping&> (unique_ptr.h:962)
==41155==    by 0x1D8EEB: WarpX::AllocLevelMFs(int, amrex::BoxArray const&, amrex::DistributionMapping const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, bool) (WarpX.cpp:2170)
==41155==    by 0x1DCEAB: WarpX::AllocLevelData(int, amrex::BoxArray const&, amrex::DistributionMapping const&) (WarpX.cpp:1680)
==41155==    by 0x1DCFC7: WarpX::MakeNewLevelFromScratch(int, float, amrex::BoxArray const&, amrex::DistributionMapping const&) (WarpX.cpp:1548)
==41155==    by 0x6D620D: amrex::AmrMesh::MakeNewGrids(float) (AMReX_AmrMesh.cpp:779)
==41155==    by 0x3DDC2F: InitFromScratch (WarpXInitData.cpp:472)
==41155==    by 0x3DDC2F: WarpX::InitData() (WarpXInitData.cpp:378)
==41155==    by 0x1BB856: main (main.cpp:65)
==41155== 
==41155== Invalid read of size 4
==41155==    at 0x55CFBD: Add<float> (AMReX_GpuAtomic.H:584)
==41155==    by 0x55CFBD: WarpX::shiftMF(amrex::MultiFab&, amrex::Geometry const&, int, int, int, float, bool, amrex::ParserExecutor<3> const&) (WarpXMovingWindow.cpp:435)
==41155==    by 0x55F92F: WarpX::MoveWindow(int, bool) (WarpXMovingWindow.cpp:193)
==41155==    by 0x372D78: WarpX::Evolve(int) (WarpXEvolve.cpp:269)
==41155==    by 0x1BB863: main (main.cpp:67)
==41155==  Address 0xb8c925c is 4 bytes before a block of size 32 alloc'd
==41155==    at 0x4840F2F: operator new(unsigned long) (vg_replace_malloc.c:422)
==41155==    by 0x1F088A: allocate (new_allocator.h:127)
==41155==    by 0x1F088A: allocate (alloc_traits.h:464)
==41155==    by 0x1F088A: _M_allocate (stl_vector.h:346)
==41155==    by 0x1F088A: std::vector<float, std::allocator<float> >::_M_default_append(unsigned long) (vector.tcc:635)
==41155==    by 0x1D8EEB: define (AMReX_LayoutData.H:31)
==41155==    by 0x1D8EEB: LayoutData (AMReX_LayoutData.H:22)
==41155==    by 0x1D8EEB: make_unique<amrex::LayoutData<float>, const amrex::BoxArray&, const amrex::DistributionMapping&> (unique_ptr.h:962)
==41155==    by 0x1D8EEB: WarpX::AllocLevelMFs(int, amrex::BoxArray const&, amrex::DistributionMapping const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, bool) (WarpX.cpp:2170)
==41155==    by 0x1DCEAB: WarpX::AllocLevelData(int, amrex::BoxArray const&, amrex::DistributionMapping const&) (WarpX.cpp:1680)
==41155==    by 0x1DCFC7: WarpX::MakeNewLevelFromScratch(int, float, amrex::BoxArray const&, amrex::DistributionMapping const&) (WarpX.cpp:1548)
==41155==    by 0x6D620D: amrex::AmrMesh::MakeNewGrids(float) (AMReX_AmrMesh.cpp:779)
==41155==    by 0x3DDC2F: InitFromScratch (WarpXInitData.cpp:472)
==41155==    by 0x3DDC2F: WarpX::InitData() (WarpXInitData.cpp:378)
==41155==    by 0x1BB856: main (main.cpp:65)
==41155== 
==41155== Invalid write of size 4
==41155==    at 0x55CFC1: Add<float> (AMReX_GpuAtomic.H:584)
==41155==    by 0x55CFC1: WarpX::shiftMF(amrex::MultiFab&, amrex::Geometry const&, int, int, int, float, bool, amrex::ParserExecutor<3> const&) (WarpXMovingWindow.cpp:435)
==41155==    by 0x55F92F: WarpX::MoveWindow(int, bool) (WarpXMovingWindow.cpp:193)
==41155==    by 0x372D78: WarpX::Evolve(int) (WarpXEvolve.cpp:269)
==41155==    by 0x1BB863: main (main.cpp:67)
==41155==  Address 0xb8c925c is 4 bytes before a block of size 32 alloc'd
==41155==    at 0x4840F2F: operator new(unsigned long) (vg_replace_malloc.c:422)
==41155==    by 0x1F088A: allocate (new_allocator.h:127)
==41155==    by 0x1F088A: allocate (alloc_traits.h:464)
==41155==    by 0x1F088A: _M_allocate (stl_vector.h:346)
==41155==    by 0x1F088A: std::vector<float, std::allocator<float> >::_M_default_append(unsigned long) (vector.tcc:635)
==41155==    by 0x1D8EEB: define (AMReX_LayoutData.H:31)
==41155==    by 0x1D8EEB: LayoutData (AMReX_LayoutData.H:22)
==41155==    by 0x1D8EEB: make_unique<amrex::LayoutData<float>, const amrex::BoxArray&, const amrex::DistributionMapping&> (unique_ptr.h:962)
==41155==    by 0x1D8EEB: WarpX::AllocLevelMFs(int, amrex::BoxArray const&, amrex::DistributionMapping const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, amrex::IntVect const&, bool) (WarpX.cpp:2170)
==41155==    by 0x1DCEAB: WarpX::AllocLevelData(int, amrex::BoxArray const&, amrex::DistributionMapping const&) (WarpX.cpp:1680)
==41155==    by 0x1DCFC7: WarpX::MakeNewLevelFromScratch(int, float, amrex::BoxArray const&, amrex::DistributionMapping const&) (WarpX.cpp:1548)
==41155==    by 0x6D620D: amrex::AmrMesh::MakeNewGrids(float) (AMReX_AmrMesh.cpp:779)
==41155==    by 0x3DDC2F: InitFromScratch (WarpXInitData.cpp:472)
==41155==    by 0x3DDC2F: WarpX::InitData() (WarpXInitData.cpp:378)
==41155==    by 0x1BB856: main (main.cpp:65)
==41155== 
STEP 3 ends. TIME = 1.787413796e-15 DT = 5.958046162e-16
Evolve time = 44.5962677 s; This step = 13.71976852 s; Avg. per step = 14.86542225 s
ax3l commented 2 years ago

cc @kngott could that be related to what you saw with graphs this week?

lucafedeli88 commented 2 years ago

Hello ! I have investigated a little bit this issue. It seems to me that what happens is the following:

1) when the moving window is activated, shiftMF is called on all the multifabs 2) shiftMF also acts on costs (I am using timers - based load-balancing) 3) in this specific example the simulation is partitioned into 8 boxes, so most multifabs and the costs array have a size equal to 8 4) we have an issue when we call shiftMF on pml_B , since pml_B in this particular example has a size equal to 24, and indices from 0 to 23 are used to access also the components of costs (which has a size equal to 8 ). Do you have an idea about how to fix the issue ? Should we add more components to costs ? should we disable costs update if a multifab is a PML ? should we do something else ?