illia-thiele commented 6 years ago

Dear SMILEI community,

I have an error which appears on the cluster occigen. During the runtime the following message is given in the output file: 4020/6501 1.5710e+01 1.1425e+03 ( 8.0868e+02 ) ERROR src/Patch/VectorPatch.cpp:1323 (createPatches) No patch to clone. This should never happen!

I had two simulations with exactly the same input file, but one had the error and another finished property. Have you ever seen anything similar? Do you have an idea about what might be wrong?

This is the whole output file:

Reading the simulation parameters

HDF5 version 1.8.18 Python version 2.7.12 Parsing pyinit.py Parsing v3.4-71-g2866a41-master Parsing pyprofiles.py Parsing THzEbeamAmp2dMirror_E0z1d0_tp2d35_wp3d14_nm10000_xmp1d4_gamma20_n28d3_tn0d1_tdm0d6_qneutr_100ppc_4.py Parsing pycontrol.py Calling python _smilei_check Calling python _prepare_checkpoint_dir [WARNING] Change patches distribution to hilbert [WARNING] Particles cluster width set to : 5

Geometry: 2Dcartesian

 Interpolation order : 2
 Maxwell solver : Yee
 (Time resolution, Total simulation time) : (255.921148, 25.402356)
 (Total number of iterations,   timestep) : (6501, 0.003907)
            timestep  = 0.996966 * CFL
 dimension 0 - (Spatial resolution, Grid length) : (254.647909, 18.849556)
             - (Number of cells,    Cell length)  : (4800, 0.003927)
             - Electromagnetic boundary conditions: (silver-muller, silver-muller)
                 - Electromagnetic boundary conditions k    : ( [1.00, 0.00] , [-1.00, -0.00] )
 dimension 1 - (Spatial resolution, Grid length) : (15.92, 25.13)
             - (Number of cells,    Cell length)  : (400, 0.06)
             - Electromagnetic boundary conditions: (silver-muller, silver-muller)
                 - Electromagnetic boundary conditions k    : ( [0.00, 1.00] , [-0.00, -1.00] )

Load Balancing:

 Patches are initially homogeneously distributed between MPI ranks. (initial_balance = false) 
 Happens: every 20 iterations
 Cell load coefficient = 1.00
 Frozen particle load coefficient = 0.10

Initializing MPI

 Number of MPI process : 28
 Number of patches : 
     dimension 0 - number_of_patches : 64
     dimension 1 - number_of_patches : 2
 Patch size :
     dimension 0 - n_space : 75 cells.
     dimension 1 - n_space : 200 cells.
 Dynamic load balancing: every 20 iterations

OpenMP

 Number of thread per MPI process : 2

Initializing the restart environment

Initializing moving window

 Moving window is active:
     velocity_x : 1.00
     time_start : 15.71

Initializing particles & fields

 Creating Species : ions
 Creating Species : electrons
 Creating Species : mirror_ion
 Creating Species : mirror_eon
 Laser parameters :
    Laser #0: separable profile
        omega              : 1
        chirp_profile      : 1D built-in profile `tconstant`
        time envelope      : 1D built-in profile `tgaussian`
        space envelope (y) : 1D user-defined function
        space envelope (z) : 1D user-defined function
        phase          (y) : 1D user-defined function
        phase          (z) : 1D user-defined function
    delay phase      (y) : 0
    delay phase      (z) : 1.5708
 Adding particle walls:
     Nothing to do

Initializing Patches

 First patch created
     Approximately 10% of patches created
     Approximately 20% of patches created
     Approximately 30% of patches created
     Approximately 40% of patches created
 All patches created

Creating Diagnostics, antennas, and external fields

 Diagnostic Fields #0  :
     Ex Ey Ez By_m Bz_m Jx Jy Jz Rho_electrons Rho_mirror_ion Rho_mirror_eon 
 Done initializing diagnostics, antennas, and external fields

Applying external fields at time t = 0

Solving Poisson at time t = 0

 Poisson solver converged at iteration: 0, relative err is ctrl = 0.00 x 1e-14
 Poisson equation solved. Maximum err = 0.00 at i= -1

Time in Poisson : 0.00

Initializing diagnostics

Running diags at time t = 0

Species creation summary

     Species 0 (ions) created with 52776600 particles
     Species 1 (electrons) created with 52776600 particles
     Species 2 (mirror_ion) created with 13696000 particles
     Species 3 (mirror_eon) created with 13696000 particles

Memory consumption

 (Master) Species part = 0 MB
 Global Species part = 6.191 GB
 Max Species part = 1177 MB
 (Master) Fields part = 10 MB
 Global Fields part = 0.250 GB
 Max Fields part = 10 MB

Expected disk usage (approximate)

 WARNING: disk usage by non-uniform particles maybe strongly underestimated,
    especially when particles are created at runtime (ionization, pair generation, etc.)

 Expected disk usage for diagnostics:
     File Fields0.h5: 10.58 G
     File scalars.txt: 1.11 M
 Total disk usage for diagnostics: 10.58 G

Cleaning up python runtime environement

 Checking for cleanup() function:
 python cleanup function does not exist
 Calling python _keep_python_running() :
     Closing Python

Time-Loop started: number of time-steps n_time = 6501

timestep       sim time   cpu time [s]   (    diff [s] )
804/6501     3.1435e+00     6.7802e+01   (  6.7802e+01 )

1608/6501 6.2851e+00 1.1969e+02 ( 5.1892e+01 ) 2412/6501 9.4267e+00 1.6992e+02 ( 5.0226e+01 ) 3216/6501 1.2568e+01 3.3386e+02 ( 1.6394e+02 )

Window starts moving 4020/6501 1.5710e+01 1.1425e+03 ( 8.0868e+02 ) ERROR src/Patch/VectorPatch.cpp:1323 (createPatches) No patch to clone. This should never happen!

beck-llr commented 6 years ago

Hi

As stated in the error message, this should never happen ... Did the simulation which did work correctly with the same input file had a different parallel setup ? Different number of MPI ranks or openMP threads ?
Could you please share your input file so that I can run a few tests on my side ? I assume you are using the latest version of the master branch of this git Hub repo ?

beck-llr commented 6 years ago

A list of things you can try and let me know if it improves things:

Try using 8 patches along Y instead of only 2.
Try using less MPI ranks and more openMP threads. On Occigen you can easily use 12 threads on Haswell partition and 14 on the Broadwell partition. You are using 56 cores right now so I would go 4 MPI ranks and 14 openMP threads on 2 Occigen Broadwell nodes.
Try deactivating load balancing.

illia-thiele commented 6 years ago

Dear Arnaud,

Thank you very much for the quick answer. I will follow your suggestions.

To answer to your first questions: I have used the same parallel setup in both cases and the master branch. This is my input:

import math import numpy as np

l0 = 2.0 math.pi # wavelength in normalized units t0 = l0 # optical cycle in normalized units rest = 1608.0 # nb of timestep in 1 optical cycle resx = 1600.0 # nb cells in 1 wavelength resy = 100.0 # nb cells in 1 wavelength tc = 2. math.pi # time center for the laser at the boundary L0 = 3.0 l0 # box length Ly0 = 8. np.pi # box width N_patch = 64

Ebeam

xfwhm_e = 0.326943431266 / 4. # fwhm beams length of the electrons vm = 0.95 # electron beam velocity xc = 2.xfwhm_e + 2. # distance to hold from beam xi = 2. / 3. L0 # interaction place between mirror and electron beam xmp = 1.4 xmr = 0.1 xmp n_e = 14.15 4. # peak electorn density gamma_e = 20. # electron gamma factor

T_e = 0.1 # el. beam temperature

vm = np.sqrt(gamma_e**2-1) / gamma_e r_e = np.pi # el. beam FWHM width time_plasma_frozen = tc + L0 - xi - xc / vm + (0.6 + 0.5) / vm

Laser:

E0 = 1. w0 = np.pi zr = w0**2 / 2. fm = 0.#np.pi # distance between mirror and focal plane

Main( geometry = "2Dcartesian", interpolation_order = 2, # only 2 available

cell_length = [l0/resx,l0/resy],
grid_length  = [L0, Ly0],

number_of_patches = [ N_patch , 2],

timestep = t0/rest,
simulation_time = t0/rest * 7001,

EM_boundary_conditions = [
    ['silver-muller'],
    ['silver-muller'],
],

random_seed = smilei_mpi_rank,

print_every = int(rest/2.0),
solve_poisson = False

)

LoadBalancing( initial_balance = True, every = 20, cell_load = 1., frozen_particle_load = 0.1 )

Species( name = "electrons", time_frozen = time_plasma_frozen, position_initialization = "random", momentum_initialization = "cold",

temperature = [T_e,0.,0.],

particles_per_cell = 100,
mass = 1.,
atomic_number = None,
number_density = gaussian(max=n_e, xfwhm=xfwhm_e, xcenter=xi-xc, xorder=2, yfwhm=r_e, ycenter=Ly0/2., yorder=2),
#charge_density = 3.5/np.sqrt(1-0.95**2),
charge = -1.,
mean_velocity = [vm,0.,0.],
boundary_conditions = [
    ["remove", "remove"],
    ["remove", "remove"],
#    ["periodic", "periodic"],
],
# thermal_boundary_temperature = None,
# thermal_boundary_velocity = None,
# ionization_model = "none",
# ionization_electrons = None,
is_test = False,
#c_part_max = 1.0,
pusher = "boris",

)

Species( name = 'mirror_eon', position_initialization = 'random', momentum_initialization = 'cold', ionization_model = 'none', particles_per_cell = 100,

c_part_max = 1.0,

mass = 1.0,
charge = -1.0,
number_density = trapezoidal(10000., xvacuum=xi - xmp - 2.*xmr, xslope1=xmr, xplateau=xmp, xslope2=xmr, yvacuum=Ly0/10., yslope1=Ly0/10., yplateau=0.6*Ly0, yslope2=Ly0/10.),
time_frozen = time_plasma_frozen,
boundary_conditions = [
    ["remove", "remove"],
    ["remove", "remove"],
#    ["periodic", "periodic"],
#    ["periodic", "periodic"],
],

)

LaserGaussian2D( box_side = "xmax", a0 = E0, omega = 1., focus = [L0-xi+fm, Ly0/2.], waist = w0, incidence_angle = 0., polarization_phi = np.pi/2., ellipticity = 0., time_envelope = tgaussian(start=0.,fwhm=np.pi/np.sqrt(2.), center=tc) )

MovingWindow( time_start = tc + L0 - xi + tc / 2., velocity_x = 1., )

DiagScalar( every = 10 )

DiagFields(#0 every = 100, flush_every = 1000, fields = ['Ex','Ey','Ez','Bz_m','By_m','Bz_m','Jx','Jy','Jz','Rho_electrons','Rho_mirror_eon'] )

Moreover, I have used the following setup:

SBATCH --nodes=1

SBATCH --ntasks=28

SBATCH --threads-per-core=2

beck-llr commented 6 years ago

You're using hyperthreading. We strongly advise against this. It won't explain the crash though.
Thanks for sharing, I'll try to get back to you.

beck-llr commented 6 years ago

Something is not right in your input file in the number density of mirror electron: xvacuum=xi - xmp - 2.xmr. Do you mean 2*xmr ?

illia-thiele commented 6 years ago

Both #SBATCH --threads-per-core=2 and #SBATCH --threads-per-core=1 give the error sometimes.

Yes, I mean 2.xmr. The editor has removed all the signs like if spaces where missing.

beck-llr commented 6 years ago

Ok according to my few basic tests my inital feelings seem to be confirmed. You are running the code with very few patches per MPI rank and a very unbalanced setup. You end up having a single patch per MPI rank, or the code tries to exchange patches such as this happens, which the code is supposed to be able to handle but obviously does not in your case.

I'll try to understand and fix the code to avoid the crash. Nevertheless, understand that you will be in much better conditions using much more patches per MPI rank. That can easily be done simply by using less MPI ranks and more openMP threads (option 2), or using smaller patches (option 1), or both.
Also note that having a large discrepancy between the number of patches in each dimension is not recommended either.

The crash can be triggered either by the moving window or the dynamic load balancing that's why I suggested option 3.

I hope this helps !

illia-thiele commented 6 years ago

Thank you very much! I will follow your suggestions. A few new simulations have already finished property.

beck-llr commented 6 years ago

No news so I assume good news and close the issue.
An additional safety measure has been implemented in order to mitigate this error and make the code more robust. It will be available in the next release. Nevertheless, this situation is sub-optimal and users should try to avoid it by favoring openMP over MPI decomposition and by avoiding too large patches.

SmileiPIC / Smilei

ERROR: No patch to clone. #60

Reading the simulation parameters

Geometry: 2Dcartesian

Load Balancing:

Initializing MPI

OpenMP

Initializing the restart environment

Initializing moving window

Initializing particles & fields

Initializing Patches

Creating Diagnostics, antennas, and external fields

Applying external fields at time t = 0

Solving Poisson at time t = 0

Initializing diagnostics

Running diags at time t = 0

Species creation summary

Memory consumption

Expected disk usage (approximate)

Cleaning up python runtime environement

Time-Loop started: number of time-steps n_time = 6501

Ebeam

T_e = 0.1 # el. beam temperature

Laser:

temperature = [T_e,0.,0.],

c_part_max = 1.0,

SBATCH --nodes=1

SBATCH --ntasks=28

SBATCH --threads-per-core=2