Open PrometheusPi opened 7 years ago
Checking the hdf5 output of the hdf5 plugin, I found no particles. However, the default LWFA setup should contain particles.
# output of h5ls -r simData_0.h5
/ Group
/data Group
/data/0 Group
/data/0/fields Group
/data/0/fields/B Group
/data/0/fields/B/x Dataset {128, 896, 128}
/data/0/fields/B/y Dataset {128, 896, 128}
/data/0/fields/B/z Dataset {128, 896, 128}
/data/0/fields/E Group
/data/0/fields/E/x Dataset {128, 896, 128}
/data/0/fields/E/y Dataset {128, 896, 128}
/data/0/fields/E/z Dataset {128, 896, 128}
/data/0/fields/e_chargeDensity Dataset {128, 896, 128}
/data/0/fields/e_energyDensity Dataset {128, 896, 128}
/data/0/fields/e_particleMomentumComponent Dataset {128, 896, 128}
/data/0/particles Group
/data/0/particles/e Group
/data/0/particles/e/charge Group
/data/0/particles/e/mass Group
/data/0/particles/e/momentum Group
/data/0/particles/e/momentum/x Dataset {NULL}
/data/0/particles/e/momentum/y Dataset {NULL}
/data/0/particles/e/momentum/z Dataset {NULL}
/data/0/particles/e/particlePatches Group
/data/0/particles/e/particlePatches/extent Group
/data/0/particles/e/particlePatches/extent/x Dataset {32}
/data/0/particles/e/particlePatches/extent/y Dataset {32}
/data/0/particles/e/particlePatches/extent/z Dataset {32}
/data/0/particles/e/particlePatches/numParticles Dataset {32}
/data/0/particles/e/particlePatches/numParticlesOffset Dataset {32}
/data/0/particles/e/particlePatches/offset Group
/data/0/particles/e/particlePatches/offset/x Dataset {32}
/data/0/particles/e/particlePatches/offset/y Dataset {32}
/data/0/particles/e/particlePatches/offset/z Dataset {32}
/data/0/particles/e/position Group
/data/0/particles/e/position/x Dataset {NULL}
/data/0/particles/e/position/y Dataset {NULL}
/data/0/particles/e/position/z Dataset {NULL}
/data/0/particles/e/positionOffset Group
/data/0/particles/e/positionOffset/x Dataset {NULL}
/data/0/particles/e/positionOffset/y Dataset {NULL}
/data/0/particles/e/positionOffset/z Dataset {NULL}
/data/0/particles/e/weighting Dataset {NULL}
/data/0/picongpu Group
/data/0/picongpu/idProvider Group
/data/0/picongpu/idProvider/nextId Dataset {2, 8, 2}
/data/0/picongpu/idProvider/startId Dataset {2, 8, 2}
/header Group
The macro particle counter also results in zero particles.
With all debug output on one sees that the error occurs after initialization and during the particle distribution according to the density profile.
The error occurs in picongpu/src/picongpu/include/particles/Particles.tpp
line 278.
PMACC_KERNEL( KernelFillGridWithParticles< Particles >{} )
(mapper.getGridDim(), block)
( densityFunctor, positionFunctor, totalGpuCellOffset, this->particlesBuffer->getDeviceParti\
cleBox( ), mapper );
The last verbose output message in stdout
is
...
PIConGPUVerbose SIMULATION_STATE(16) | Starting simulation from timestep 0
PIConGPUVerbose SIMULATION_STATE(16) | Loading from default values finished
PMaccVerbose MEMORY(1) | DataConnector: sharing access to 'e' (1 uses)
PIConGPUVerbose SIMULATION_STATE(16) | initialize density profile for species e
This issue comes from the random number generator used during random position initialization. Using quiet start solves the issue.
Thus @BeyondEspresso and @steindev, this will not be an issue with TWTS since all random distributions will be done one the CPU beforehand.
@HighIander and @n01r even when using quiet start, you will most likely encounter the same issue when using an ionization scheme based on probability.
@ax3l or @psychocoderHPC Is setting CUDA_ARCH
to 60
correct for the Tesla P100?
Because I am a bit confused: on the CSCS web site they say they use NVIDIA® Tesla® P100 16GB
but on this web page there is no such thing as a Tesla P100 - there is just a Pascal P100 (SM_60) and a Tesla V100 (SM_70 CUDA 9 only).
Okay Tesla P100-PCIE-16GB
is the same, see here.
Please increase the free memory in the file memory.param This should solve your issue. The reason is that the p100 is much more parallel than all gpus before (more smx) therefore there is not enough memory for lmem used for the rng initialization.
sm_60 is correct for the P100.
@psychocoderHPC Thanks - setting reservedGpuMemorySize
to twice its original value (now 350 *1024*1024 * 2
) solved the issue.
we should really integrate the "legacy" RNG for startups into our new state-aware RNG implementation to reduce the extra memory in memory.param
dramatically.
This will not help. The rng initialization still needs lmem. I am currently thinking about compiling for all architectures check the lmem usage by hand and than keep as much memory free as the worst case architecture needs multiplied by smx times max parallel blocks per smx. -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
I reopen this issue until a more generic solution is found
self answer to my post https://github.com/ComputationalRadiationPhysics/picongpu/issues/2357#issuecomment-342004101 It is not possible to check the lmem usage for all kernel and than multiply by the maximum hardware threads per GPU. The reason is that a P100 can handle 2048 threads per multiprocessor and contains 56 SMX.
When running the default LWFA example using version
0.3.1
of PIConGPU, the simulation fails during initialization when writing checkpoints.I could reproduce this with both libSplashed compiled against ADIOS and HDF5 (parallel) only. However, writing hdf5 output via the plugin works just fine, as long as checkpoints are not active.
I use the following modules on Piz Daint:
And build the additional libraries using the script of @ax3l here (great tool :+1:) (For hdf5 only, I removed the ADIOS library and rebuild libSplash)
Configuring worked just fine. Compiling produced a massive amount of boost warnings.
The
stderr
when--checkpoints 5000
is not active:However the simulations runs fine (see
stdout
):The
stderr
when--checkpoints 5000
is active:Here, the simulations dies during initialization (see
stdout
):Additionally to the default command line arguments of the 32 GPU example, I just used
--hdf5.period 1000
and--checkpoints 5000
. All hdf5 files from the hdf5 plugin were written correctly.I am not sure, whether the memory errors actually are causing the failure because they occur as well without checkpoints (just a bit differently). Any idea how to solve this issue? I think neither @HighIander 's project nor the TWTS project (cc @BeyondEspresso and @steindev) will work without checkpoints.