SmileiPIC / Smilei

Particle-in-cell code for plasma simulation
https://smileipic.github.io/Smilei
345 stars 121 forks source link

Injection of hot electron beam raises error `double free or corruption (out)` #349

Open weipengyao opened 3 years ago

weipengyao commented 3 years ago

Description

I am using the injection module for hot electron transport in solid target.

When the temperature of the injected electron beam is high, like Te=100 keV, the code runs hundreds of steps and then crashes with the error double free or corruption (out): 0x0000000003b5f6f0 ***. While when reduce the temperature, e.g. Te=50 eV, the code runs fine (at least within the simulation time).

Please find the related output files here:

a.out.txt test.py.txt a.err.txt

Steps to reproduce the problem

To reproduce the problem, just use the namelist above and compare the two cases with different temperatures.

And this info. about iterator validity might be helpful.

Parameters

make env gives:

SMILEICXX : mpicxx
PYTHONEXE : python3
MPIVERSION :
VERSION : b'v4.4-784-gc3f8cc8'-b'work'
OPENMP_FLAG : -fopenmp -D_OMP
HDF5_ROOT_DIR : /scinet/niagara/software/2019b/modules/intel-2019u3-intelmpi-2019u3/hdf5/1.10.5
SITEDIR : /home/a/anticipa/weipeng/.local/lib/python3.6/site-packages
PY_CXXFLAGS : -I/scinet/niagara/software/2019b/opt/base/python/3.6.8/include/python3.6m -I/scinet/niagara/software/2019b/opt/base/python/3.6.8/include/python3.6m -I/scinet/niagara/software/2019b/opt/base/python/3.6.8/lib/python3.6/site-packages/numpy/core/include -DSMILEI_USE_NUMPY -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION
PY_LDFLAGS : -lpython3.6m -lpthread -ldl -lutil -lm -Xlinker -export-dynamic
CXXFLAGS : -D__VERSION=\"b'v4.4-784-gc3f8cc8'-b'work'\" -D_VECTO -std=c++11 -Wall  -I/scinet/niagara/software/2019b/modules/intel-2019u3-intelmpi-2019u3/hdf5/1.10.5/include -Isrc -Isrc/Params -Isrc/ElectroMagnSolver -Isrc/ElectroMagn -Isrc/ElectroMagnBC -Isrc/Particles -Isrc/Radiation -Isrc/Ionization -Isrc/Interpolator -Isrc/Collisions -Isrc/Merging -Isrc/Tools -Isrc/Python -Isrc/Projector -Isrc/DomainDecomposition -Isrc/MovWindow -Isrc/Profiles -Isrc/picsar_interface -Isrc/Checkpoint -Isrc/Pusher -Isrc/Field -Isrc/MultiphotonBreitWheeler -Isrc/SmileiMPI -Isrc/Species -Isrc/Diagnostic -Isrc/ParticleInjector -Isrc/Patch -Ibuild/src/Python -I/scinet/niagara/software/2019b/opt/base/python/3.6.8/include/python3.6m -I/scinet/niagara/software/2019b/opt/base/python/3.6.8/include/python3.6m -I/scinet/niagara/software/2019b/opt/base/python/3.6.8/lib/python3.6/site-packages/numpy/core/include -DSMILEI_USE_NUMPY -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -O3 -g  -fopenmp -D_OMP
LDFLAGS : -L/scinet/niagara/software/2019b/modules/intel-2019u3-intelmpi-2019u3/hdf5/1.10.5/lib   -lhdf5 -lpython3.6m -lpthread -ldl -lutil -lm -Xlinker -export-dynamic -L/scinet/niagara/software/2019b/opt/base/python/3.6.8/lib -lm -fopenmp -D_OMP
xxirii commented 3 years ago

Dear @weipengyao,

I am studying your issue and so far I could not reproduce your problem.

If you use a supercomputer can you show me your launch script (the one you use to launch the simulation) or the exact configuration that you use (number of MPI tasks, OpenMP threads...)

Thank you

weipengyao commented 3 years ago

Dear @xxirii,

Thanks for your time and reply.

I checked again with the namelist attached and found that this error occurred (at 200 timestep) with 160 cores, but not with 40 cores (which might happen later).

I am running this on the supercomputer Niagara, and I use the smilei.sh 160 test.py in the debug cluster, with 4 nodes (40 cores per node). From the a.out.txt, you may notice that I just use:

...
Initializing MPI
 --------------------------------------------------------------------------------
     MPI_THREAD_MULTIPLE enabled
     Number of MPI process : 160
     Number of patches : 
         dimension 0 - number_of_patches : 128
         dimension 1 - number_of_patches : 128
     Patch size :
         dimension 0 - n_space : 20 cells.
         dimension 1 - n_space : 20 cells.
     Dynamic load balancing: never

 OpenMP
 --------------------------------------------------------------------------------
     Number of thread per MPI process : 1
...

Let me know if you need anything else.

Best, Yao

xxirii commented 3 years ago

Thank you, do you use any particular OMP environment variable like a specific SCHEDULER or thread placement?

weipengyao commented 3 years ago

I don't think I do.

Here's the script I use to compile Smilei on Niagara (hope it can help anyone else using Smilei on Niagara). compile_smilei_niagara.sh.txt

To save your time from downloading, it reads:

module purge
module load NiaEnv/2019b
module load intel/2019u3
module load intelmpi/2019u3
module load hdf5/1.10.5
module load python/3.6.8
export HDF5_ROOT_DIR=/scinet/niagara/software/2019b/modules/intel-2019u3-intelmpi-2019u3/hdf5/1.10.5
export PYTHONEXE=python3

export OMP_NUM_THREADS=1
export OMP_SCHEDULE=dynamic
export OMP_PROC_BIND=true
export OMPI_MCA_btl_portals4_use_rdma=0

# For MPI-tags:
export MPIR_CVAR_CH4_OFI_TAG_BITS=26
export MPIR_CVAR_CH4_OFI_RANK_BITS=13

I only have something 'special' for MPI-tag related issues (#307).

I checked my ~/.bashrc, and I don't have anything related there. Do you think I need to check any other possible places?

Thanks!

xxirii commented 3 years ago

Thank you,

I have managed to reproduce the bug using exactly your configuration. It does not appear when you use a hybrid mode with more than 1 OpenMP thread per MPI. I will investigate but you should be able to run your case in hybrid if you need the results soon for science.

Moreover, in my case, I have a hdf5 issue when I use the variable debug_every in the collision. So if you have the same you can comment it.

xxirii commented 3 years ago

For instance, using 16 mpi tasks and 10 OpenMP threads per task I am at iteration 3700 after 8 minutes.

weipengyao commented 3 years ago

Dear @xxirii,

Thanks for the timely reply.

For me, I need to use ten times cores, i.e. 1600, with more particles like ppc=256 (in order to suppress the noise). It seems that this crash appears when you use a core number above a certain value (and that explains the 16x10 scheme to be working).

About the debug_every related hdf5 issue, I don't have it in my case, for now. But I remember when I try to use multiple species in the Collision a long time ago, there's a problem (see #307).

I hope it helps.

xxirii commented 3 years ago

Right, it's surprising to see that it works with 159 MPI tasks and segfault with 160. Very strange.

xxirii commented 3 years ago

Note that the bug only occurs when I use strictly 160 cores. When I use more it seems to work. Have you tried a case with more ppc and more MPI tasks that crashes?

weipengyao commented 3 years ago

Yes, I have. Please see this output file for example.

HEB2D_dep2_Inj128_Z10_T100_np1_Th1k_FixIon_SBC_Collee.py-4673320.out.txt